exlab_wizard.validator.engine#

Validator engine. Backend Spec §4.4.4, §8.1, §11.7, §11.8.

The engine is the single component that implements the rules in §8.1 and runs in two modes against the same rule set:

  • Creation-time mode (Validator.validate_creation()) – input is a resolved destination path, a resolved variable map, and the post-render content of files about to be written. Output is a flat list of Finding instances. The controller raises a ValidationError containing this list when any hard-tier finding fires (§8 bullet “Validation”). This mode does not touch the disk; it dispatches to the pure rule-check helpers in exlab_wizard.validator.rules.

  • Audit mode (Validator.audit()) – walks a directory subtree under the managed local_root (and staging_root when orchestrator mode is on; §11.8). Output is a flat list of Finding instances sorted by (tier desc, rule, offending_path). Reads creation.json per directory via msgspec.json.decode; bounded text-file content scans per ValidatorConfig.content_scan_max_mib and ValidatorConfig.content_scan_extensions; binary files always skipped via the 8-KiB null-byte sniff (§8.1.1).

  • Validator.query_problems() – public read-only alias for Validator.audit() that satisfies the §11.8 problem-query contract. Does not mutate creation.json, does not write log entries, does not initiate sync.

Performance commitments (§4.5, §11.8):

  • The directory walk uses os.scandir (NOT pathlib.Path.rglob). DirEntry.is_dir() / is_file() are cached from the iteration, so the walk avoids per-entry stat() syscalls.

  • creation.json is decoded via msgspec.json.decode.

  • Regex patterns are pre-compiled at module load (constants/patterns.py).

  • Pattern matching uses stdlib re only (no hyperscan, ripgrep) so the §11.8 determinism contract holds across hosts.

Determinism (§11.8). The same inputs always produce the same finding list in the same order. The constructor accepts a ValidatorConfig so the per-lab content-scan tuning (size cap, extension list) is captured as part of the input contract; if no config is supplied the engine uses the documented defaults from §9.

Sort order. The engine returns findings sorted by (tier, rule, offending_path). tier is sorted with "hard" before "soft" (matching the §11.8 contract that hard-tier findings appear first in the Problems tab). rule and offending_path are sorted lexicographically. The ordering is total: two findings with identical (tier, rule, offending_path) are equal under the comparator, but the underlying list keeps insertion order via a stable sort.

Classes

AuditScopeAll

Audit every configured equipment + staging when orchestrator on.

AuditScopeEquipment

Audit one equipment subtree.

AuditScopeProject

Audit one <equipment>/<project> subtree.

CreationValidationInput(proposed_path[, ...])

Input bundle for creation-time validation.

Validator([validator_config, ...])

Run §8.1 rules in creation-time and audit modes.

class exlab_wizard.validator.engine.AuditScopeAll[source]#

Bases: TypedDict

Audit every configured equipment + staging when orchestrator on.

Spec §11.8. The value field is omitted; the constant kind of "all" is the discriminator.

kind: Literal['all']#
class exlab_wizard.validator.engine.AuditScopeEquipment[source]#

Bases: TypedDict

Audit one equipment subtree. Spec §11.8.

The value is the equipment ID (matched against the configured equipment[].id list); the engine resolves it to the equipment’s local_root via the equipment-config map handed to the constructor.

kind: Literal['equipment_id']#
value: str#
class exlab_wizard.validator.engine.AuditScopeProject[source]#

Bases: TypedDict

Audit one <equipment>/<project> subtree. Spec §11.8.

The value is an absolute project-level directory path. Useful for the per-project Problems tab view (Frontend §3.8).

kind: Literal['project_path']#
value: str#
class exlab_wizard.validator.engine.CreationValidationInput(proposed_path, variables=<factory>, file_names=(), file_contents=<factory>, run_kind='experimental', template_required_field_ids=(), config_required_field_ids=(), readme_fields=<factory>)[source]#

Bases: object

Input bundle for creation-time validation. Backend Spec §8.1, §11.8.

All fields are positional-or-keyword. The dataclass is frozen so callers cannot mutate the bundle between dispatch passes; this matches the determinism contract (§11.8).

proposed_path#

The destination path the creation controller is about to write to. Used to derive the per-segment lists for the path-segment rules. Accepts / and \ as separators (the splitter handles both).

variables#

The resolved Copier variable dict. Keys are the template question IDs (lower-snake) and values are the resolved values. Reserved for downstream rules; not directly consumed by the §8.1 rule set today.

file_names#

File names that will be written into the destination. Bare names without directory components.

file_contents#

Post-render content for files about to be written (text only; binaries excluded by the caller). Keys are the same names as file_names for the entries that have content.

run_kind#

"experimental" or "test"; mirrors the creation.json run_kind value.

template_required_field_ids#

README field ids the template marks required (parsed from copier.yml _exlab_* metadata).

config_required_field_ids#

README field ids config.yaml readme.defaults marks required.

readme_fields#

The merged readme_fields_json dict the controller is about to write. Used by the missing-required-field rule.

Parameters:
config_required_field_ids: tuple[str, ...] = ()#
file_contents: Mapping[str, str]#
file_names: tuple[str, ...] = ()#
proposed_path: str#
readme_fields: Mapping[str, object]#
run_kind: str = 'experimental'#
template_required_field_ids: tuple[str, ...] = ()#
variables: Mapping[str, object]#
class exlab_wizard.validator.engine.Validator(validator_config=None, *, equipment_roots=None, staging_root=None)[source]#

Bases: object

Run §8.1 rules in creation-time and audit modes. Backend Spec §11.8.

The constructor accepts a ValidatorConfig – the §9 validator block – so callers can tune the content-scan size cap and extension list. The default constructs a fresh ValidatorConfig with the §9 defaults (content_scan_max_mib=5 and the canonical extension list).

Audit-mode callers also pass the equipment-roots map (mapping equipment_id -> absolute equipment directory) and an optional staging_root. These default to empty when audit mode is not in use; creation-time-only callers can omit them.

Parameters:
audit(scope)[source]#

Walk a directory subtree and return all findings.

Backend Spec §11.8. Uses os.scandir (NOT pathlib.rglob) per Backend §4.5. Reads creation.json via msgspec.json.decode(..., type=CreationJson) where present. Bounded text-file content scan via content_scan_max_mib and content_scan_extensions. Binary files are always skipped via the 8-KiB null-byte sniff (§8.1.1).

scope is one of:

  • {"kind": "equipment_id", "value": "<id>"} – one equipment subtree (resolved via the equipment-roots map handed to the constructor).

  • {"kind": "project_path", "value": "<absolute path>"} – one project subtree.

  • {"kind": "all"} – every configured equipment plus the staging root when orchestrator is on.

Returns a Finding list sorted by (tier desc, rule, offending_path). The list is deterministic across repeated calls with the same fixture: a contract pinned by test_validator_determinism.py.

Parameters:

scope (AuditScopeEquipment | AuditScopeProject | AuditScopeAll)

Return type:

list[Finding]

property config: ValidatorConfig#

The ValidatorConfig this engine instance was built with.

Exposed read-only so audit-mode helpers (Agent C) can consult the same content-scan limits as the creation-time pass.

classmethod from_config(config)[source]#

Build a Validator from the full config.yaml model.

Projects the relevant fields out of exlab_wizard.config.models.Config so the engine is not coupled to the entire config schema. Used by the FastAPI lifespan when wiring the audit task.

Parameters:

config (Any)

Return type:

Validator

query_problems(scope)[source]#

Public read-only alias for audit().

Backend Spec §11.8. Read-only: does not mutate creation.json, does not write log entries, does not initiate sync. The GUI’s per-row actions (mark-as-known, override) call dedicated mutation endpoints rather than this method.

Parameters:

scope (AuditScopeEquipment | AuditScopeProject | AuditScopeAll)

Return type:

list[Finding]

validate_creation(params)[source]#

Run every §8.1 creation-time rule against params.

Returns a flat list of Finding instances sorted by (tier, rule, offending_path) with hard-tier findings first.

Dispatch order (each helper returns list[dict] in the rules-module contract; the engine stamps each dict with the common Finding fields the helper does not know):

  1. check_unresolved_placeholder – against path segments, file names, and the file contents map. Markdown front-matter extraction happens inside the rule helper.

  2. check_illegal_filesystem_character – against path segments and file names.

  3. check_reserved_filesystem_name – against file names (Windows reserved-name set; case-insensitive).

  4. check_mode_prefix_mismatch – against the leaf and parent of the proposed path, with the declared run_kind.

  5. check_missing_required_field – against the merged readme_fields dict and the union of required IDs from the template + config layers.

  6. check_malformed_yaml_front_matter – against file_contents['README.md'] if present.

The orphan rule (§8.1.4) is not dispatched here – it is an audit-mode rule by spec. The mode-prefix mismatch rule is the only one of the seven that consults run_kind; everything else is structural.

Parameters:

params (CreationValidationInput)

Return type:

list[Finding]