coco_pipe.dim_reduction.artifacts#

Artifact persistence helpers for dimensionality-reduction runs.

This module owns the save/load/update layer that the dim-reduction pipeline uses to checkpoint fit results and post-hoc eval results to disk. Keeping these functions in coco-pipe (rather than in a consumer project) means any project that runs the dim-reduction pipeline can share the same artifact format without duplicating the serialization logic.

Public API#

save_fit_artifact

Write embedding, ids, fit payload, metrics, and diagnostics to a directory and stamp it with _SUCCESS.

save_eval_artifact

Write an eval payload to a directory and stamp it with _SUCCESS.

load_fit_artifact

Read a fit artifact directory back into a dict.

load_fit_runs

Read a JSON runs inventory file back into a list of dicts.

update_runs

Upsert a record into a JSON runs inventory, keeping it sorted.

Constants#

SEPARATION_METRIC_KEY

Canonical key for the logistic-regression separation metric returned by coco_pipe.dim_reduction.evaluation.core.evaluate_embedding().

FIT_METRIC_COLUMNS

Ordered list of geometry quality metric names written by fit artifacts.

EVAL_METRIC_COLUMNS

Ordered list of eval metric names written by eval artifacts.

FIT_RUN_KEY_FIELDS

Fields that uniquely identify a fit run in the runs inventory.

EVAL_RUN_KEY_FIELDS

Fields that uniquely identify an eval run in the runs inventory.

Functions#

save_fit_artifact(path, embedding, ids, fit_payload, ...)

Write a fit artifact to path and stamp it with _SUCCESS.

save_eval_artifact(path, eval_payload)

Write an eval artifact to path and stamp it with _SUCCESS.

load_fit_artifact(path)

Load a fit artifact directory into a dict.

load_fit_runs(path)

Load a fit-runs inventory JSON file into a list of dicts.

update_runs(path, record, key_fields)

Upsert record into a JSON runs inventory at path.

build_record(payload, artifact_path, output_root, ...)

Build a flat run-inventory record from an artifact payload dict.

write_run_status(output_root, fit_runs_path, ...[, ...])

Write run_summary.json and a run-marker sentinel to output_root.

build_availability_record(*, scope, condition, ...)

Build a data-availability record from container shape information.

Module Contents#

coco_pipe.dim_reduction.artifacts.save_fit_artifact(path, embedding, ids, fit_payload, metrics_payload, diagnostics)#

Write a fit artifact to path and stamp it with _SUCCESS.

The artifact directory receives three files (a compact layout that keeps the inode count low for high-cardinality analysis modes):

  • fit.npz — embedding, ids, and reducer diagnostics in one archive

  • fit.json — the fit payload and geometry quality metrics

  • _SUCCESS — sentinel that marks the artifact as complete

load_fit_artifact() also reads the older seven-file layout, so existing artifacts remain loadable.

Parameters:
  • path (pathlib.Path) – Target directory. Created (including parents) if it does not exist.

  • embedding (numpy.ndarray) – Array of shape (n_obs, n_dims).

  • ids (numpy.ndarray) – 1-D array of observation identifiers aligned with embedding.

  • fit_payload (dict[str, Any]) – Serialisable dict describing the fit (reducer, config, provenance, …). The key "artifact_stem" is recorded for provenance.

  • metrics_payload (dict[str, Any]) – Serialisable dict of geometry quality metrics (trustworthiness, …).

  • diagnostics (dict[str, Any]) – Serialisable dict of reducer-internal diagnostics. May be empty.

Return type:

None

coco_pipe.dim_reduction.artifacts.save_eval_artifact(path, eval_payload)#

Write an eval artifact to path and stamp it with _SUCCESS.

The artifact directory receives two files:

  • eval.json — the full eval payload

  • _SUCCESS — sentinel that marks the artifact as complete

_load_eval_payload() also reads the older <stem>_eval.json layout.

Parameters:
  • path (pathlib.Path) – Target directory. Created (including parents) if it does not exist.

  • eval_payload (dict[str, Any]) – Serialisable dict describing the post-hoc evaluation results. The key "artifact_stem" is recorded for provenance when present.

Return type:

None

coco_pipe.dim_reduction.artifacts.load_fit_artifact(path)#

Load a fit artifact directory into a dict.

The compact fit.npz + fit.json layout written by save_fit_artifact() is read first; if absent, the legacy seven-file layout is resolved via artifact_manifest.json then a glob fallback so artifacts written by earlier versions remain loadable.

Parameters:

path (pathlib.Path) – Artifact directory written by save_fit_artifact().

Returns:

embedding, ids, fit, metrics, diagnostics, manifest, path, and embedding_container. The embedding_container is a DataContainer view of the embedding (or None for non-2D embeddings); the embedding and ids arrays remain for direct array access.

Return type:

dict with keys

coco_pipe.dim_reduction.artifacts.load_fit_runs(path)#

Load a fit-runs inventory JSON file into a list of dicts.

Parameters:

path (pathlib.Path) – Path to the JSON file written by update_runs().

Return type:

list of dict

Raises:
coco_pipe.dim_reduction.artifacts.update_runs(path, record, key_fields)#

Upsert record into a JSON runs inventory at path.

If path already exists, the list is loaded and the record whose key_fields values match record’s is replaced (upsert semantics). If no match is found, record is appended. The list is then sorted by key_fields followed by common run-taxonomy fields and written back.

Parameters:
  • path (pathlib.Path) – Target JSON file. The parent directory is created if necessary.

  • record (dict[str, Any]) – Dict to upsert. Must contain all fields listed in key_fields.

  • key_fields (collections.abc.Sequence[str]) – Ordered sequence of field names that uniquely identify a run.

Return type:

None

coco_pipe.dim_reduction.artifacts.build_record(payload, artifact_path, output_root, metric_columns, metrics_payload=None, error=None)#

Build a flat run-inventory record from an artifact payload dict.

Used for both fit and eval records — pass FIT_METRIC_COLUMNS or EVAL_METRIC_COLUMNS as metric_columns.

Strips bulky sub-dicts (metrics, records, metadata, artifacts) from payload, adds a relative artifact_path, promotes each metric in metric_columns to a top-level float (or nan), and stamps the record with a status of "success" or "failed".

Parameters:
Return type:

dict[str, Any]

coco_pipe.dim_reduction.artifacts.write_run_status(output_root, fit_runs_path, eval_runs_path, *, run_summary_path=None, fatal_error=None, report_path=None, run_metadata=None)#

Write run_summary.json and a run-marker sentinel to output_root.

The marker file is one of _RUN_SUCCESS, _RUN_PARTIAL, or _RUN_FAILED. Any pre-existing marker files are removed first so only one is present at a time.

Run status logic:

  • success — at least one successful fit or eval, no failures

  • partial — both successes and failures present

  • failed — only failures, or fatal_error is set with no successes

Parameters:
  • output_root (pathlib.Path) – Directory that receives run_summary.json and the marker file.

  • fit_runs_path (pathlib.Path) – Path to the fit runs JSON inventory (may not exist yet).

  • eval_runs_path (pathlib.Path) – Path to the eval runs JSON inventory (may not exist yet).

  • run_summary_path (pathlib.Path | None) – Optional explicit path for the summary JSON. Defaults to output_root / "run_summary.json".

  • fatal_error (str | None) – If set, the run is marked as at least partially failed.

  • report_path (pathlib.Path | None) – Path to the generated HTML report, if any.

  • run_metadata (dict[str, Any] | None) – Extra key/value pairs merged into the summary payload.

Returns:

The summary payload written to disk.

Return type:

dict

coco_pipe.dim_reduction.artifacts.build_availability_record(*, scope, condition, unit_spec, container, requested_components, valid_components)#

Build a data-availability record from container shape information.

Captures matrix dimensions and which n_components values are feasible vs skipped, for inclusion in run-level provenance metadata.

Parameters:
Return type:

dict[str, Any]