exlab_wizard.sync#
Sync package. Backend Spec §7.
Public re-exports for the NAS sync subsystem. Callers should depend on
this package’s surface rather than reach into the sub-modules; the
NASSyncClient is the only stateful object outside callers
typically interact with.
- exlab_wizard.sync.HandleState#
alias of
SyncHandleState
- class exlab_wizard.sync.NASSyncClient(*, config, queue_db, validator, cache_creation, verifier=None, worker_poll_interval_s=0.05, push_callable_factory=None, hashsum_callable_factory=None, remote_stat_callable=None)[source]#
Bases:
objectDurable, per-equipment NAS sync queue with Pre-Sync Gate.
Backend Spec §7.1, §7.3.
Lifecycle:
init()opens the queue DB, replays any in-flight jobs, and starts a single background worker task.enqueue()runs the Pre-Sync Gate, gates the run if needed, and otherwise inserts aQUEUEDrow.close()cancels the worker and closes the DB.
The worker loop is a simple “pick the oldest QUEUED whose
next_attempt_athas passed” scheduler with at-most-one inflight job at a time. This keeps determinism for tests; production deployments can extend to per-equipment parallelism without changing the public API.- Parameters:
config (
Config)queue_db (
Path)validator (
Validator)cache_creation (
CreationWriter)worker_poll_interval_s (
float)push_callable_factory (
Callable[[EquipmentConfig],Callable[...,Any]] |None)hashsum_callable_factory (
Callable[[EquipmentConfig],Callable[[Path],Awaitable[dict[str,str]]]] |None)remote_stat_callable (
Callable[[SyncJobRow],bool] |None)
- async enqueue(run_path)[source]#
Pre-Sync Gate -> if hard-tier finding without override, mark
sync_status='blocked_by_validation'. Otherwise insert aQUEUEDrow.Returns a
SyncJobHandle. The handle’sstateis eitherSyncHandleState.BLOCKEDorSyncHandleState.QUEUED.- Parameters:
run_path (
Path)- Return type:
- async force_verify(run_path)[source]#
Recompute the local manifest and verify against itself.
Used by the Settings “verify integrity” action. Does not advance the queue state.
- Parameters:
run_path (
Path)- Return type:
- async status(run_path)[source]#
Return the queue state of the job for
run_path."none"when no job exists; otherwise the underlyingSyncJobStatevalue.
- class exlab_wizard.sync.SyncHandleState(*values)[source]#
Bases:
StrEnumIn-process state of a NAS sync job handle as observed by callers.
Distinct from
SyncStatus(which is persisted in creation.json) and from the queue’s internalSyncJobState(which tracks the row in sync_queue.db). Backend Spec §7.1.- BLOCKED = 'blocked'#
- QUEUED = 'queued'#
- class exlab_wizard.sync.SyncJobHandle(job_id, state, run_path, blocking_findings=())[source]#
Bases:
objectLightweight handle returned by
NASSyncClient.enqueue().job_idis empty when the gate blocked enqueue (the on-disksync_statuswill reflect the block).blocking_findingsis present iffstate == BLOCKED.- Parameters:
- state: SyncHandleState#
- class exlab_wizard.sync.SyncJobRow(id, run_path, equipment_id, state, attempts=0, last_attempt_at=None, next_attempt_at=None, last_error=None, verify_passes=0, verified_at=None, enqueued_at='', nas_path=None)[source]#
Bases:
objectOne row in the
jobstable. Backend Spec §7.1.1.- Parameters:
- state: SyncJobState#
- class exlab_wizard.sync.SyncJobState(*values)[source]#
Bases:
StrEnumState machine for a sync job. Backend Spec §7.1.2.
- AWAITING_VERIFY = 'awaiting_verify'#
- CLEANED = 'cleaned'#
- CLEANUP_ELIGIBLE = 'cleanup_eligible'#
- FAILED = 'failed'#
- QUEUED = 'queued'#
- RUNNING = 'running'#
- VERIFIED = 'verified'#
- class exlab_wizard.sync.SyncQueue(db_path)[source]#
Bases:
objectAsync SQLite-backed durable sync queue. Backend Spec §7.1.1.
Use
init()once at application startup; the database file is created on demand. After init, all CRUD methods are coroutines.- Parameters:
db_path (
Path)
- async get_by_id(job_id)[source]#
Return the row with the given
job_idorNone.- Parameters:
job_id (
str)- Return type:
- async get_by_run_path(run_path)[source]#
Return the row whose
run_pathmatches, orNone.- Parameters:
run_path (
Path)- Return type:
- async init()[source]#
Open the connection, ensure the schema, and replay in-flight rows.
Replay semantics (§7.1.2): any
RUNNINGrow at startup gets downgraded toQUEUED(the worker died mid-transfer); anyAWAITING_VERIFYrow is left as-is so the verifier picks it up.- Return type:
- async insert(*, run_path, equipment_id, nas_path=None, job_id=None)[source]#
Insert a new
QUEUEDrow forrun_path.Raises
aiosqlite.IntegrityError(via the UNIQUE constraint onrun_path) if a row already exists for the same path.
- static is_terminal(state)[source]#
Return True if the state is a terminal state (no further work).
- Parameters:
state (
SyncJobState)- Return type:
- async list_in_state(state)[source]#
Return every row currently in
stateordered byenqueued_at.- Parameters:
state (
SyncJobState)- Return type:
- async record_failure(job_id, error, *, terminal=False, now=None)[source]#
Record a transport failure on
job_id.If
terminalis True (auth failure, local file vanished) the job goes straight toFAILEDwith no backoff. Otherwise:increment
attemptsif
attempts >= MAX_ATTEMPTS: terminalFAILED.else: stay in
QUEUEDwithnext_attempt_atper backoff.
- async reset_to_queued(job_id)[source]#
Reset a
FAILEDjob back toQUEUEDfor a manual retry.Per §7.1.5 the Problems-tab Retry action re-enqueues a failed job.
attemptsandlast_errorare cleared so the backoff schedule starts fresh.- Parameters:
job_id (
str)- Return type:
- async transition(job_id, new_state, *, last_error=None, increment_attempts=False, increment_verify_passes=False, verified_at=None, next_attempt_at=None, last_attempt_at=None, nas_path=None)[source]#
Transition a job to
new_stateand patch the auxiliary columns.The patch is one
UPDATEstatement so either every column moves or none do. RaisesValueErrorif the job is missing.
- exception exlab_wizard.sync.TransportError(message, *, error_kind=None)[source]#
Bases:
ExceptionRaised when a transport probe cannot complete.
Distinct from
TransportResult: this surfaces conditions where the transport’s hashsum-style probe could not produce a manifest at all – either because the upstream binary is missing (error_kind=Nonefor the historical “binary not on PATH” case) or because the probe ran and failed in a classifiable way (AUTH, NETWORK, UNKNOWN). The queue worker useserror_kindto route the failure through the §7.1.5 retry policy:AUTH– terminal FAILED, no retry (configuration problem).NETWORK– non-terminal failure, exponential backoff retry.UNKNOWN– treated asNETWORKfor retry purposes.None– legacy “binary missing” callers. Routed through the §7.1.5 HASH_MISMATCH branch: one immediate retry, then terminal FAILED on the second occurrence. The operator surfaces the binary-missing reason vialast_error.
- Parameters:
message (
str)error_kind (
TransportErrorKind|None)
- class exlab_wizard.sync.TransportErrorKind(*values)[source]#
Bases:
StrEnumKind of failure that the queue worker uses to drive the retry policy.
Backend Spec §7.1.5.
NETWORK: timeout, ECONNRESET, transient SSH failure – retried with exponential backoff up toMAX_ATTEMPTS.AUTH: authentication failure – terminal FAILED, no retry.HASH_MISMATCH: post-transport hash check failed – single retry of the transport phase, then terminal.LOCAL_FILE_VANISHED: the local file disappeared between transport and verify – terminal FAILED withlocal_file_vanishedreason.UNKNOWN: catch-all for transports returning a non-zero code we don’t recognize – treated asNETWORKfor retry purposes.
- AUTH = 'auth'#
- HASH_MISMATCH = 'hash_mismatch'#
- LOCAL_FILE_VANISHED = 'local_file_vanished'#
- NETWORK = 'network'#
- UNKNOWN = 'unknown'#
- class exlab_wizard.sync.TransportResult(ok, error_kind=None, stderr='', stdout='', returncode=0)[source]#
Bases:
objectOutcome of a transport push.
okis True iff the transport reported success. On failure,error_kindselects the retry path;stderris the raw stderr text for log surfacing;returncodeis the subprocess exit code.- Parameters:
- error_kind: TransportErrorKind | None#
- class exlab_wizard.sync.Verifier[source]#
Bases:
objectSHA-256 verifier. Backend Spec §7.1.4.
- async compute_local_manifest(run_path)[source]#
Walk
run_pathand compute a SHA-256 per file.Writes the manifest to
run_path/.exlab-wizard/checksums.sha256as a side-effect (the §7.1.4 contract). Files inside the.exlab-wizard/cache subtree are excluded so the manifest does not record its own hash.
- async verify_against_local(run_path, manifest)[source]#
Re-hash every entry in
manifestagainst the local subtree.Returns a
VerifyResultwithok=Trueiff every entry in the manifest exists locally with the recorded hash.Files on disk that are NOT in the manifest are returned in
extrafor diagnostic logging but do not by themselves causeok=False; a partial transport that wrote a fresh file would be caught by a later compute_local_manifest pass.- Parameters:
- Return type:
- verify_against_remote(local_manifest, remote_manifest)[source]#
Compare a local manifest against a remote-derived manifest.
Pure dict comparison with no I/O. Use after the transport reports success, with
remote_manifestderived from a remote hash probe (e.g.rclone hashsum sha256orssh ... sha256sum).mismatched: keys present in both with differing hex digests.missing: keys present locally but absent remotely; this is the integrity-in-transit failure mode.extra: keys present remotely but not locally; informational only and does not flipok.ok = not mismatched and not missing. An emptyremote_manifesttherefore yieldsok=Falsewith every local key listed inmissing.
- class exlab_wizard.sync.VerifyResult(ok, mismatched=(), missing=(), extra=(), manifest=<factory>, error_kind=None)[source]#
Bases:
objectOutcome of a verifier pass.
okis True iff every file in the manifest matched.mismatchedlists relative paths whose hash differed;missinglists paths in the manifest that no longer exist on disk;extralists files on disk that were not in the manifest (informational only).error_kindis set when the remote-hash probe could not complete (the underlyingexlab_wizard.sync.transports.TransportErrorclassified the failure as AUTH / NETWORK / UNKNOWN). The queue worker keys off this field to route via the §7.1.5 retry policy: AUTH -> terminal FAILED, NETWORK / UNKNOWN -> backoff retry.Nonefor every non-remote-probe outcome.- Parameters:
- error_kind: TransportErrorKind | None#
- exlab_wizard.sync.cleanup_interlocks_satisfied(*, job, run_path, now_utc, config, overrides_active, remote_stat_ok)[source]#
Evaluate every §7.1.6 interlock; return True iff all pass.
Logs a debug entry naming the failing interlock when one fails so the operator can see why a job stayed in
CLEANUP_ELIGIBLE.The
run_pathparameter is accepted (but currently unused) so callers can pass the run directory through unchanged; future interlocks (e.g., size-on-disk threshold) may consult it.- Parameters:
job (
SyncJobRow)run_path (
Any)now_utc (
datetime)config (
NASCleanupConfig)remote_stat_ok (
bool)
- Return type:
- exlab_wizard.sync.effective_bandwidth_limit_kibps(cfg, *, now_local)[source]#
Return the effective
--bwlimitin KiB/s fornow_local.Decision tree per §7.1.7:
If
cfg.upload_mbpsisNone-> unlimited (None).Else if
cfg.scheduleis empty -> the cap applies always.Else if
now_localfalls inside any schedule window -> the cap applies for this transfer.Else -> unlimited (
None).
- Parameters:
cfg (
BandwidthConfig)now_local (
datetime)
- Return type:
- exlab_wizard.sync.is_eligible(*, validator, creation_json_path, creation)[source]#
Evaluate the §7.3 eligibility rule for a run.
Returns
(True, [])iff there is no hard-tier finding without an active override. Otherwise returns(False, blocking_findings)whereblocking_findingsis the list of unmasked hard-tier findings (so the caller can surface them in logs).The
creation_json_pathis the path to.exlab-wizard/creation.json; the run directory is its parent’s parent. The validator runs in creation-time mode (no walk; just rules over path segments + file names + the cached creation payload).
Modules
Bandwidth schedule evaluator. |
|
Cleanup safety interlocks. |
|
NAS sync client. |
|
Pre-Sync Gate. |
|
Durable SQLite-backed sync-job queue. |
|
Sync transports package. |
|
SHA-256 hash verifier for synced runs. |