coco_pipe.dim_reduction.artifacts#

Artifact persistence helpers for dimensionality-reduction runs.

This module owns the save/load/update layer that the dim-reduction pipeline uses to checkpoint fit results and post-hoc eval results to disk. Keeping these functions in coco-pipe (rather than in a consumer project) means any project that runs the dim-reduction pipeline can share the same artifact format without duplicating the serialization logic.

Public API#

save_fit_artifact: Write embedding, ids, fit payload, metrics, and diagnostics to a directory and stamp it with _SUCCESS.
save_eval_artifact: Write an eval payload to a directory and stamp it with _SUCCESS.
load_fit_artifact: Read a fit artifact directory back into a dict.
load_fit_runs: Read a JSON runs inventory file back into a list of dicts.
update_runs: Upsert a record into a JSON runs inventory, keeping it sorted.

Constants#

SEPARATION_METRIC_KEY: Canonical key for the logistic-regression separation metric returned by coco_pipe.dim_reduction.evaluation.core.evaluate_embedding().
FIT_METRIC_COLUMNS: Ordered list of geometry quality metric names written by fit artifacts.
EVAL_METRIC_COLUMNS: Ordered list of eval metric names written by eval artifacts.
FIT_RUN_KEY_FIELDS: Fields that uniquely identify a fit run in the runs inventory.
EVAL_RUN_KEY_FIELDS: Fields that uniquely identify an eval run in the runs inventory.

Functions#

`save_fit_artifact`(path, embedding, ids, fit_payload, ...)	Write a fit artifact to path and stamp it with `_SUCCESS`.
`save_eval_artifact`(path, eval_payload)	Write an eval artifact to path and stamp it with `_SUCCESS`.
`load_fit_artifact`(path)	Load a fit artifact directory into a dict.
`load_fit_runs`(path)	Load a fit-runs inventory JSON file into a list of dicts.
`update_runs`(path, record, key_fields)	Upsert record into a JSON runs inventory at path.
`build_record`(payload, artifact_path, output_root, ...)	Build a flat run-inventory record from an artifact payload dict.
`write_run_status`(output_root, fit_runs_path, ...[, ...])	Write `run_summary.json` and a run-marker sentinel to output_root.
`build_availability_record`(*, scope, condition, ...)	Build a data-availability record from container shape information.

Module Contents#

coco_pipe.dim_reduction.artifacts.save_fit_artifact(path, embedding, ids, fit_payload, metrics_payload, diagnostics)#

Write a fit artifact to path and stamp it with _SUCCESS.

The artifact directory receives three files (a compact layout that keeps the inode count low for high-cardinality analysis modes):

fit.npz — embedding, ids, and reducer diagnostics in one archive
fit.json — the fit payload and geometry quality metrics
_SUCCESS — sentinel that marks the artifact as complete

load_fit_artifact() also reads the older seven-file layout, so existing artifacts remain loadable.

Parameters:

path (pathlib.Path) – Target directory. Created (including parents) if it does not exist.
embedding (numpy.ndarray) – Array of shape (n_obs, n_dims).
ids (numpy.ndarray) – 1-D array of observation identifiers aligned with embedding.
fit_payload (dict[str, Any]) – Serialisable dict describing the fit (reducer, config, provenance, …). The key "artifact_stem" is recorded for provenance.
metrics_payload (dict[str, Any]) – Serialisable dict of geometry quality metrics (trustworthiness, …).
diagnostics (dict[str, Any]) – Serialisable dict of reducer-internal diagnostics. May be empty.

Return type:

None

coco_pipe.dim_reduction.artifacts.save_eval_artifact(path, eval_payload)#

Write an eval artifact to path and stamp it with _SUCCESS.

The artifact directory receives two files:

eval.json — the full eval payload
_SUCCESS — sentinel that marks the artifact as complete

_load_eval_payload() also reads the older <stem>_eval.json layout.

Parameters:

path (pathlib.Path) – Target directory. Created (including parents) if it does not exist.
eval_payload (dict[str, Any]) – Serialisable dict describing the post-hoc evaluation results. The key "artifact_stem" is recorded for provenance when present.

Return type:

None

coco_pipe.dim_reduction.artifacts.load_fit_artifact(path)#

Load a fit artifact directory into a dict.

The compact fit.npz + fit.json layout written by save_fit_artifact() is read first; if absent, the legacy seven-file layout is resolved via artifact_manifest.json then a glob fallback so artifacts written by earlier versions remain loadable.

Parameters:: path (pathlib.Path) – Artifact directory written by save_fit_artifact().
Returns:: embedding, ids, fit, metrics, diagnostics, manifest, path, and embedding_container. The embedding_container is a DataContainer view of the embedding (or None for non-2D embeddings); the embedding and ids arrays remain for direct array access.
Return type:: dict with keys

coco_pipe.dim_reduction.artifacts.load_fit_runs(path)#

Load a fit-runs inventory JSON file into a list of dicts.

Parameters:

path (pathlib.Path) – Path to the JSON file written by update_runs().

Return type:

list of dict

Raises:

RuntimeError – If path does not exist.
ValueError – If the file does not contain a JSON list.

coco_pipe.dim_reduction.artifacts.update_runs(path, record, key_fields)#

Upsert record into a JSON runs inventory at path.

If path already exists, the list is loaded and the record whose key_fields values match record’s is replaced (upsert semantics). If no match is found, record is appended. The list is then sorted by key_fields followed by common run-taxonomy fields and written back.

Parameters:

path (pathlib.Path) – Target JSON file. The parent directory is created if necessary.
record (dict[str, Any]) – Dict to upsert. Must contain all fields listed in key_fields.
key_fields (collections.abc.Sequence[str]) – Ordered sequence of field names that uniquely identify a run.

Return type:

None

coco_pipe.dim_reduction.artifacts.build_record(payload, artifact_path, output_root, metric_columns, metrics_payload=None, error=None)#

Build a flat run-inventory record from an artifact payload dict.

Used for both fit and eval records — pass FIT_METRIC_COLUMNS or EVAL_METRIC_COLUMNS as metric_columns.

Strips bulky sub-dicts (metrics, records, metadata, artifacts) from payload, adds a relative artifact_path, promotes each metric in metric_columns to a top-level float (or nan), and stamps the record with a status of "success" or "failed".

Parameters:

payload (dict[str, Any])
artifact_path (pathlib.Path)
output_root (pathlib.Path)
metric_columns (collections.abc.Sequence[str])
metrics_payload (dict[str, Any] | None)
error (str | None)

Return type:

dict[str, Any]

coco_pipe.dim_reduction.artifacts.write_run_status(output_root, fit_runs_path, eval_runs_path, *, run_summary_path=None, fatal_error=None, report_path=None, run_metadata=None)#

Write run_summary.json and a run-marker sentinel to output_root.

The marker file is one of _RUN_SUCCESS, _RUN_PARTIAL, or _RUN_FAILED. Any pre-existing marker files are removed first so only one is present at a time.

Run status logic:

success — at least one successful fit or eval, no failures
partial — both successes and failures present
failed — only failures, or fatal_error is set with no successes

Parameters:

output_root (pathlib.Path) – Directory that receives run_summary.json and the marker file.
fit_runs_path (pathlib.Path) – Path to the fit runs JSON inventory (may not exist yet).
eval_runs_path (pathlib.Path) – Path to the eval runs JSON inventory (may not exist yet).
run_summary_path (pathlib.Path | None) – Optional explicit path for the summary JSON. Defaults to output_root / "run_summary.json".
fatal_error (str | None) – If set, the run is marked as at least partially failed.
report_path (pathlib.Path | None) – Path to the generated HTML report, if any.
run_metadata (dict[str, Any] | None) – Extra key/value pairs merged into the summary payload.

Returns:

The summary payload written to disk.

Return type:

dict

coco_pipe.dim_reduction.artifacts.build_availability_record(*, scope, condition, unit_spec, container, requested_components, valid_components)#

Build a data-availability record from container shape information.

Captures matrix dimensions and which n_components values are feasible vs skipped, for inclusion in run-level provenance metadata.

Parameters:

scope (str)
condition (str)
unit_spec (dict[str, Any] | None)
container (coco_pipe.io.structures.DataContainer)
requested_components (collections.abc.Sequence[int])
valid_components (collections.abc.Sequence[int])

Return type:

dict[str, Any]