# GPU-Accelerated Colony Detection Set up and use deep-learning-based colony detectors (SAM2, micro-sam) with GPU acceleration. ## Installation The two GPU detectors have different packaging constraints: | Detector | Package(s) needed | Available via | CUDA-capable? | |---------------------|----------------------------|----------------------------|------------------------| | `Sam2Detector` | `torch`, `torchvision`, `sam2` | **PyPI** (ships in `phenotypic[torch]`) | Yes — Linux + CUDA | | `MicroSamDetector` | `micro_sam` (+ `torch`) | **conda-forge only**, not on PyPI | CPU by default; user-managed CUDA possible | PhenoTypic itself is distributed via PyPI and managed with `uv`. `micro_sam` is not published on PyPI, so it is **not** included in any `phenotypic` extra. Users who need `MicroSamDetector` must install `micro_sam` themselves; the recipe below uses `pixi` for that. ### Installing `Sam2Detector` (PyPI-only) On Linux or macOS: ```bash uv add "phenotypic[torch]" # torch + torchvision + sam2 # or, inside a uv-managed project: uv sync --extra torch ``` The `torch` extra is not available on Windows — `sam2` requires CUDA `nvcc` and has no pre-built Windows wheels. Use WSL2 (Ubuntu) instead. ### Enabling `micro_sam` (optional, self-service) `micro_sam` is only published on conda-forge. Because PhenoTypic does not own your environment, we recommend managing the combined stack in your own project with [pixi](https://pixi.sh), which speaks both conda-forge and PyPI in a single lockfile. Create a `pixi.toml` in *your* project (not in PhenoTypic): ```toml [project] name = "my-phenotyping-project" channels = ["conda-forge"] platforms = ["osx-arm64", "linux-64", "win-64"] [pypi-dependencies] phenotypic = "*" # Or, while developing against a local checkout: # phenotypic = { path = "../PhenoTypic", editable = true } [dependencies] micro_sam = "*" ``` Then: ```bash pixi install pixi run python -m phenotypic pipeline.json /plates/ /output/ ``` Because conda's `micro_sam` pulls in CPU-only conda PyTorch, combining it with `Sam2Detector`'s CUDA wheels in the same environment requires extra care (the conda torch will typically win). Keep SAM2 and micro-sam work in separate environments if you need both with GPU acceleration. `MicroSamDetector` is importable from `phenotypic.nn` even when `micro_sam` is missing; the `ImportError` is deferred to the first `apply()` call and points back at these instructions. ### Alternative: pip + conda If you already manage your environment with conda: ```bash pip install phenotypic # base (or phenotypic[torch] on non-Windows) conda install -c conda-forge micro_sam # adds MicroSamDetector support ``` ## Downloading Model Checkpoints Both SAM2 and micro-sam download checkpoints automatically on first use. However, on SLURM clusters the compute nodes often lack internet access, so you should pre-download checkpoints on a login node before submitting jobs. ### SAM2 checkpoints ```bash # Download the default (tiny) SAM2 checkpoint python -m phenotypic.nn download # Download a specific size python -m phenotypic.nn download --model-type sam2 --model-size large # Download all SAM2 sizes at once python -m phenotypic.nn download --model-type sam2 --all # Force re-download even if cached python -m phenotypic.nn download --model-type sam2 --model-size tiny --force ``` SAM2 checkpoints are stored in the `torch.hub` cache directory (`~/.cache/torch/hub/checkpoints/` by default). Set the `TORCH_HOME` environment variable to change this location. Available SAM2 sizes: `tiny` (~39 MB), `small`, `base_plus`, `large` (~900 MB). ### micro-sam checkpoints ```bash # Download the default (vit_b_lm) micro-sam model python -m phenotypic.nn download --model-type microsam # Download a specific model python -m phenotypic.nn download --model-type microsam --model-name vit_l_lm # Download all micro-sam models python -m phenotypic.nn download --model-type microsam --all ``` micro-sam stores checkpoints via `platformdirs`. Set `MICROSAM_CACHEDIR` to override the cache location. ### SLURM pre-caching workflow On a cluster, download models on the login node first: ```bash # On the login node (has internet access) python -m phenotypic.nn download --model-type sam2 --model-size tiny python -m phenotypic.nn download --model-type microsam --model-name vit_b_lm # Verify they are cached python -m phenotypic.nn list # Now submit SLURM jobs -- compute nodes will use the cached checkpoints python -m phenotypic pipeline.json /plates/ /output/ ``` ## Using Sam2Detector `Sam2Detector` wraps Meta's SAM2 automatic mask generator. It lays a grid of prompt points over the RGB image, predicts masks at each point, filters by quality, and assembles a labelled object map. ```python from phenotypic.nn import Sam2Detector # Basic usage with default parameters detector = Sam2Detector() # Tuned for dense plates with small colonies detector = Sam2Detector( model_size="small", points_per_side=48, pred_iou_thresh=0.6, min_mask_region_area=200, ) # Apply to an image (downloads checkpoint on first use) result = detector.apply(image) print(result.num_objects) ``` ### Parameter tuning for colony detection - **`points_per_side`** (default 32): Controls the density of the prompt grid. Use 16 for large, well-separated colonies. Increase to 48--64 for dense plates with many small colonies. Higher values increase inference time quadratically. - **`pred_iou_thresh`** (default 0.7): Minimum predicted IoU for keeping a mask. Raise to 0.85--0.95 for conservative detection (fewer false positives); lower to 0.5 to catch faint or ambiguous colonies. - **`stability_score_thresh`** (default 0.92): Filters masks by boundary stability. Higher values keep only masks with crisp edges. - **`min_mask_region_area`** (default 100): Minimum mask area in pixels. Increase to suppress agar texture, dust, and other small artefacts that SAM2 segments as objects. Typical range: 50--500 depending on image resolution. - **`model_size`** (default `"tiny"`): `"tiny"` is fastest and sufficient for most colony plates. Use `"large"` for maximum mask quality on publication figures. ## Using MicroSamDetector `MicroSamDetector` uses SAM models finetuned on large-scale microscopy datasets. It is particularly effective for brightfield and darkfield microscopy images of agar plates. ```python from phenotypic.nn import MicroSamDetector # Default: ViT-Base light microscopy model detector = MicroSamDetector() # Use the larger model for higher accuracy detector = MicroSamDetector(model_type="vit_l_lm") result = detector.apply(image) ``` ### Model selection Light microscopy models (recommended for agar plate imaging): - `"vit_t_lm"` -- ViT-Tiny, fastest, good for rapid screening - `"vit_b_lm"` -- ViT-Base (default), best speed/accuracy trade-off - `"vit_l_lm"` -- ViT-Large, highest accuracy, most VRAM Electron microscopy models (for organelle segmentation): - `"vit_b_em_organelles"` -- ViT-Base - `"vit_l_em_organelles"` -- ViT-Large Base SAM checkpoints (without microscopy finetuning): - `"vit_t"`, `"vit_b"`, `"vit_l"`, `"vit_h"` ## Pipeline Integration GPU detectors work like any other PhenoTypic operation in a pipeline: ```python import phenotypic as pht from phenotypic.nn import Sam2Detector from phenotypic.measure import SizeMeasurer pipeline = pht.ImagePipeline( ops=[Sam2Detector(model_size="tiny", points_per_side=32)], measurer=SizeMeasurer(), name="sam2_colony_pipeline", ) # Run the pipeline results = pipeline.operate([image]) df = pipeline.measure([image]) ``` ### JSON serialization Pipelines containing GPU detectors can be saved and loaded just like any other pipeline. The detector parameters are serialized; the model weights are not (they are re-downloaded or loaded from cache when needed): ```python # Save pipeline.to_json("sam2_pipeline.json") # Load -- works without torch installed (model loads lazily on apply) restored = pht.ImagePipeline.from_json("sam2_pipeline.json") ``` Internal state (attributes prefixed with `_`, such as the loaded model) is excluded from serialization. The model is rebuilt transparently on the next call to `apply`. ## SLURM Deployment When a pipeline contains a `GpuDetector` operation (either `Sam2Detector` or `MicroSamDetector`), the CLI automatically adapts: **Local execution:** Forces sequential processing (`n_jobs=1`) to avoid multiple workers competing for the same GPU. **SLURM execution:** Automatically adds `--gpus-per-node=1` to the SLURM job if GPU resources were not explicitly requested. ```bash # GPU resources are auto-requested when the pipeline contains a GpuDetector python -m phenotypic sam2_pipeline.json /plates/ /output/ # Override with explicit SLURM GPU arguments python -m phenotypic sam2_pipeline.json /plates/ /output/ \ --slurm-args slurm_gpus_per_node=2 \ --slurm-args slurm_partition=gpu ``` Pre-cache checkpoints on the login node before submitting (see "Downloading Model Checkpoints" above). ## Device Selection Both detectors accept a `device` parameter that controls where inference runs. ### Automatic detection (default) With `device="auto"` (the default), PhenoTypic probes accelerators in priority order: 1. **CUDA** -- NVIDIA GPUs 2. **MPS** -- Apple Silicon (macOS) 3. **XPU** -- Intel GPUs 4. **HPU** -- Habana Gaudi accelerators If none is found, a `RuntimeError` is raised. ### Explicit device ```python # Force a specific device Sam2Detector(device="cuda") # NVIDIA GPU Sam2Detector(device="mps") # Apple Silicon Sam2Detector(device="xpu") # Intel GPU Sam2Detector(device="cpu") # CPU (very slow, but always available) ``` When an explicit accelerator is requested but unavailable, a `RuntimeError` is raised with a descriptive message. ### `resolve_device()` utility The device resolution logic is available as a standalone function for custom workflows: ```python from phenotypic.nn._checkpoint_manager import resolve_device device = resolve_device("auto") # raises if no accelerator device = resolve_device("auto", allow_cpu=True) # falls back to CPU with warning ``` ## Listing and Clearing Models ### List cached checkpoints ```bash python -m phenotypic.nn list ``` This prints a table showing all cached SAM2 and micro-sam checkpoints with their file sizes and paths. ### Clear cached checkpoints ```bash # Clear all cached checkpoints (prompts for confirmation) python -m phenotypic.nn clear # Clear only SAM2 checkpoints python -m phenotypic.nn clear --model-type sam2 # Clear only micro-sam checkpoints python -m phenotypic.nn clear --model-type microsam ``` ## Troubleshooting ### `ImportError: Sam2Detector requires the sam2 package` PyTorch and the model packages are not installed. Install the `torch` extra: ```bash uv add "phenotypic[torch]" ``` (Linux/macOS only — `sam2` is not packaged for Windows.) ### `ImportError: MicroSamDetector requires the micro_sam package` `micro_sam` is conda-only and must be installed separately. See [Enabling `micro_sam` (optional, self-service)](#enabling-micro_sam-optional-self-service) above. ### `RuntimeError: No accelerator available` No GPU was detected. Options: - Ensure your GPU drivers and CUDA toolkit are installed correctly. - On macOS with Apple Silicon, ensure PyTorch >= 2.0 with MPS support. - Pass `device="cpu"` to force CPU inference (very slow): ```python Sam2Detector(device="cpu") ``` ### `RuntimeError: device='cuda' requested but CUDA is not available` CUDA was explicitly requested but is not available. Check: - `nvidia-smi` shows your GPU. - PyTorch was installed with CUDA support (`torch.cuda.is_available()` returns `True`). - On SLURM, the job was submitted to a GPU partition. ### Out of memory (OOM) errors SAM models require significant GPU memory. To reduce VRAM usage: - Use a smaller model: `Sam2Detector(model_size="tiny")` instead of `"large"`. - Use `MicroSamDetector(model_type="vit_t_lm")` for the smallest micro-sam model. - Reduce `points_per_side` (e.g., 16 instead of 32) to generate fewer candidate masks. - Process smaller images or downscale before detection. ### Checkpoint not found on SLURM compute nodes Compute nodes often lack internet access. Pre-download checkpoints on the login node: ```bash python -m phenotypic.nn download --model-type sam2 --model-size tiny python -m phenotypic.nn download --model-type microsam --model-name vit_b_lm python -m phenotypic.nn list # verify ``` Ensure `TORCH_HOME` and `MICROSAM_CACHEDIR` (if customised) point to a shared filesystem accessible from compute nodes.