{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial 6: Batch Processing\n", "\n", "When you have dozens or hundreds of plates to process, running Python code\n", "one image at a time is not practical. PhenoTypic's **command-line interface**\n", "(CLI) lets you apply a saved pipeline to an entire directory of plate images\n", "with built-in parallelism, checkpointing, and resume.\n", "\n", "**What you will learn:**\n", "\n", "1. Save a pipeline to JSON for CLI use\n", "2. Run batch processing from the command line\n", "3. Resume interrupted jobs\n", "4. Control parallelism and other options" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Build and Save a Pipeline\n", "\n", "The CLI needs a saved pipeline JSON file. Let's create one with the\n", "enhance-detect-measure workflow we have been building throughout\n", "these tutorials." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import phenotypic as pht\n", "from phenotypic.enhance import GaussianBlur, CLAHE\n", "from phenotypic.detect import OtsuDetector\n", "from phenotypic.measure import MeasureSize, MeasureShape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pipeline = pht.ImagePipeline(\n", " ops=[GaussianBlur(sigma=2.0), CLAHE(clip_limit=0.01), OtsuDetector()],\n", " meas=[MeasureSize(), MeasureShape()],\n", " name=\"batch_pipeline\",\n", ")\n", "pipeline.to_json(\"batch_pipeline.json\")\n", "print(\"Pipeline saved!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Run Batch Processing\n", "\n", "The CLI entry point is `python -m phenotypic`. At minimum it needs three\n", "arguments: the pipeline JSON file, the input directory containing your\n", "plate images, and the output directory where results will be saved.\n", "\n", "```bash\n", "python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/\n", "```\n", "\n", "The CLI will:\n", "1. Discover all images in the input directory\n", "2. Apply the pipeline to each one\n", "3. Save processed images, overlays, and measurement CSVs to the output directory\n", "4. Checkpoint progress periodically" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Specify Image Type and Grid Dimensions\n", "\n", "For grid plates, tell the CLI to use `GridImage` and specify the grid layout:\n", "\n", "```bash\n", "python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/ \\\n", " --image-type GridImage \\\n", " --nrows 8 --ncols 12 \\\n", " --ext .png\n", "```\n", "\n", "- **`--image-type GridImage`** — load images as GridImage (default is Image)\n", "- **`--nrows 8 --ncols 12`** — 96-well grid layout\n", "- **`--ext .png`** — only process files with this extension" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4: Parallelism\n", "\n", "Process multiple plates at once with `--n-jobs`:\n", "\n", "```bash\n", "python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/ \\\n", " --n-jobs 4\n", "```\n", "\n", "This runs 4 plates in parallel. By default, the CLI uses all available CPU\n", "cores." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5: Resume Interrupted Jobs\n", "\n", "If a batch job is interrupted (crash, timeout, Ctrl+C), you can resume\n", "from where it left off:\n", "\n", "```bash\n", "python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/ \\\n", " --resume\n", "```\n", "\n", "The CLI reads the checkpoint file in the output directory and skips plates\n", "that were already processed. Only unfinished plates are reprocessed.\n", "\n", "To retry only plates that *failed* (not skipped):\n", "\n", "```bash\n", "python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/ \\\n", " --resume --retry-failures\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Useful Flags\n", "\n", "| Flag | Purpose |\n", "|------|---------|\n", "| `--dry-run` | Validate pipeline and list images without processing |\n", "| `--sample 5` | Process only 5 random images (great for testing) |\n", "| `--checkpoint-interval 50` | Save state every 50 images |\n", "| `--force-local` | Run locally even if SLURM is available |\n", "| `--restart` | Clear all state and start fresh |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean Up" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "os.remove(\"batch_pipeline.json\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "You now know how to scale PhenoTypic from a single plate to hundreds:\n", "\n", "- **Save your pipeline** to JSON with `.to_json()`\n", "- **Run `python -m phenotypic`** with your pipeline, input directory, and output directory\n", "- **`--n-jobs`** controls parallelism\n", "- **`--resume`** picks up where you left off after interruptions\n", "- **`--dry-run`** and **`--sample`** let you test before committing to a full run\n", "\n", "**Next up:** [Tutorial 7: Measuring and Exporting](07_measuring_and_exporting.ipynb) —\n", "extract size, shape, and intensity measurements from detected colonies." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.11.0" } }, "nbformat": 4, "nbformat_minor": 4 }