Tutorial 6: Batch Processing#

When you have dozens or hundreds of plates to process, running Python code one image at a time is not practical. PhenoTypic’s command-line interface

  1. lets you apply a saved pipeline to an entire directory of plate images with built-in parallelism, checkpointing, and resume.

What you will learn:

  1. Save a pipeline to JSON for CLI use

  2. Run batch processing from the command line

  3. Resume interrupted jobs

  4. Control parallelism and other options

Step 1: Build and Save a Pipeline#

The CLI needs a saved pipeline JSON file. Let’s create one with the enhance-detect-measure workflow we have been building throughout these tutorials.

[1]:
import phenotypic as pht
from phenotypic.enhance import GaussianBlur, CLAHE
from phenotypic.detect import OtsuDetector
from phenotypic.measure import MeasureSize, MeasureShape
[2]:
pipeline = pht.ImagePipeline(
    ops=[GaussianBlur(sigma=2.0), CLAHE(clip_limit=0.01), OtsuDetector()],
    meas=[MeasureSize(), MeasureShape()],
    name="batch_pipeline",
)
pipeline.to_json("batch_pipeline.json")
print("Pipeline saved!")
Pipeline saved!

Step 2: Run Batch Processing#

The CLI entry point is python -m phenotypic. At minimum it needs three arguments: the pipeline JSON file, the input directory containing your plate images, and the output directory where results will be saved.

python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/

The CLI will:

  1. Discover all images in the input directory

  2. Apply the pipeline to each one

  3. Save processed images, overlays, and measurement CSVs to the output directory

  4. Checkpoint progress periodically

Step 3: Specify Image Type and Grid Dimensions#

For grid plates, tell the CLI to use GridImage and specify the grid layout:

python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/ \
    --image-type GridImage \
    --nrows 8 --ncols 12 \
    --ext .png
  • ``–image-type GridImage`` — load images as GridImage (default is Image)

  • ``–nrows 8 –ncols 12`` — 96-well grid layout

  • ``–ext .png`` — only process files with this extension

Step 4: Parallelism#

Process multiple plates at once with --n-jobs:

python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/ \
    --n-jobs 4

This runs 4 plates in parallel. By default, the CLI uses all available CPU cores.

Step 5: Resume Interrupted Jobs#

If a batch job is interrupted (crash, timeout, Ctrl+C), you can resume from where it left off:

python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/ \
    --resume

The CLI reads the checkpoint file in the output directory and skips plates that were already processed. Only unfinished plates are reprocessed.

To retry only plates that failed (not skipped):

python -m phenotypic batch_pipeline.json /path/to/plates/ /path/to/output/ \
    --resume --retry-failures

Useful Flags#

Flag

Purpose

--dry-run

Validate pipeline and list images without processing

--sample 5

Process only 5 random images (great for testing)

--checkpoint-interval 50

Save state every 50 images

--force-local

Run locally even if SLURM is available

--restart

Clear all state and start fresh

Clean Up#

[3]:
import os
os.remove("batch_pipeline.json")

Summary#

You now know how to scale PhenoTypic from a single plate to hundreds:

  • Save your pipeline to JSON with .to_json()

  • Run ``python -m phenotypic`` with your pipeline, input directory, and output directory

  • ``–n-jobs`` controls parallelism

  • ``–resume`` picks up where you left off after interruptions

  • ``–dry-run`` and ``–sample`` let you test before committing to a full run

Next up: Tutorial 7: Measuring and Exporting — extract size, shape, and intensity measurements from detected colonies.