coco_pipe.dim_reduction.artifacts#
Artifact persistence helpers for dimensionality-reduction runs.
This module owns the save/load/update layer that the dim-reduction pipeline uses to checkpoint fit results and post-hoc eval results to disk. Keeping these functions in coco-pipe (rather than in a consumer project) means any project that runs the dim-reduction pipeline can share the same artifact format without duplicating the serialization logic.
Public API#
- save_fit_artifact
Write embedding, ids, fit payload, metrics, and diagnostics to a directory and stamp it with
_SUCCESS.- save_eval_artifact
Write an eval payload to a directory and stamp it with
_SUCCESS.- load_fit_artifact
Read a fit artifact directory back into a dict.
- load_fit_runs
Read a JSON runs inventory file back into a list of dicts.
- update_runs
Upsert a record into a JSON runs inventory, keeping it sorted.
Constants#
- SEPARATION_METRIC_KEY
Canonical key for the logistic-regression separation metric returned by
coco_pipe.dim_reduction.evaluation.core.evaluate_embedding().- FIT_METRIC_COLUMNS
Ordered list of geometry quality metric names written by fit artifacts.
- EVAL_METRIC_COLUMNS
Ordered list of eval metric names written by eval artifacts.
- FIT_RUN_KEY_FIELDS
Fields that uniquely identify a fit run in the runs inventory.
- EVAL_RUN_KEY_FIELDS
Fields that uniquely identify an eval run in the runs inventory.
Functions#
|
Write a fit artifact to path and stamp it with |
|
Write an eval artifact to path and stamp it with |
|
Load a fit artifact directory into a dict. |
|
Load a fit-runs inventory JSON file into a list of dicts. |
|
Upsert record into a JSON runs inventory at path. |
|
Build a flat run-inventory record from an artifact payload dict. |
|
Write |
|
Build a data-availability record from container shape information. |
Module Contents#
- coco_pipe.dim_reduction.artifacts.save_fit_artifact(path, embedding, ids, fit_payload, metrics_payload, diagnostics)#
Write a fit artifact to path and stamp it with
_SUCCESS.The artifact directory receives three files (a compact layout that keeps the inode count low for high-cardinality analysis modes):
fit.npz— embedding, ids, and reducer diagnostics in one archivefit.json— the fit payload and geometry quality metrics_SUCCESS— sentinel that marks the artifact as complete
load_fit_artifact()also reads the older seven-file layout, so existing artifacts remain loadable.- Parameters:
path (pathlib.Path) – Target directory. Created (including parents) if it does not exist.
embedding (numpy.ndarray) – Array of shape
(n_obs, n_dims).ids (numpy.ndarray) – 1-D array of observation identifiers aligned with embedding.
fit_payload (dict[str, Any]) – Serialisable dict describing the fit (reducer, config, provenance, …). The key
"artifact_stem"is recorded for provenance.metrics_payload (dict[str, Any]) – Serialisable dict of geometry quality metrics (trustworthiness, …).
diagnostics (dict[str, Any]) – Serialisable dict of reducer-internal diagnostics. May be empty.
- Return type:
None
- coco_pipe.dim_reduction.artifacts.save_eval_artifact(path, eval_payload)#
Write an eval artifact to path and stamp it with
_SUCCESS.The artifact directory receives two files:
eval.json— the full eval payload_SUCCESS— sentinel that marks the artifact as complete
_load_eval_payload()also reads the older<stem>_eval.jsonlayout.- Parameters:
path (pathlib.Path) – Target directory. Created (including parents) if it does not exist.
eval_payload (dict[str, Any]) – Serialisable dict describing the post-hoc evaluation results. The key
"artifact_stem"is recorded for provenance when present.
- Return type:
None
- coco_pipe.dim_reduction.artifacts.load_fit_artifact(path)#
Load a fit artifact directory into a dict.
The compact
fit.npz+fit.jsonlayout written bysave_fit_artifact()is read first; if absent, the legacy seven-file layout is resolved viaartifact_manifest.jsonthen a glob fallback so artifacts written by earlier versions remain loadable.- Parameters:
path (pathlib.Path) – Artifact directory written by
save_fit_artifact().- Returns:
embedding,ids,fit,metrics,diagnostics,manifest,path, andembedding_container. Theembedding_containeris aDataContainerview of the embedding (orNonefor non-2D embeddings); theembeddingandidsarrays remain for direct array access.- Return type:
dict with keys
- coco_pipe.dim_reduction.artifacts.load_fit_runs(path)#
Load a fit-runs inventory JSON file into a list of dicts.
- Parameters:
path (pathlib.Path) – Path to the JSON file written by
update_runs().- Return type:
- Raises:
RuntimeError – If path does not exist.
ValueError – If the file does not contain a JSON list.
- coco_pipe.dim_reduction.artifacts.update_runs(path, record, key_fields)#
Upsert record into a JSON runs inventory at path.
If path already exists, the list is loaded and the record whose
key_fieldsvalues match record’s is replaced (upsert semantics). If no match is found, record is appended. The list is then sorted bykey_fieldsfollowed by common run-taxonomy fields and written back.- Parameters:
path (pathlib.Path) – Target JSON file. The parent directory is created if necessary.
record (dict[str, Any]) – Dict to upsert. Must contain all fields listed in key_fields.
key_fields (collections.abc.Sequence[str]) – Ordered sequence of field names that uniquely identify a run.
- Return type:
None
- coco_pipe.dim_reduction.artifacts.build_record(payload, artifact_path, output_root, metric_columns, metrics_payload=None, error=None)#
Build a flat run-inventory record from an artifact payload dict.
Used for both fit and eval records — pass
FIT_METRIC_COLUMNSorEVAL_METRIC_COLUMNSas metric_columns.Strips bulky sub-dicts (
metrics,records,metadata,artifacts) from payload, adds a relativeartifact_path, promotes each metric in metric_columns to a top-level float (ornan), and stamps the record with astatusof"success"or"failed".- Parameters:
artifact_path (pathlib.Path)
output_root (pathlib.Path)
metric_columns (collections.abc.Sequence[str])
error (str | None)
- Return type:
- coco_pipe.dim_reduction.artifacts.write_run_status(output_root, fit_runs_path, eval_runs_path, *, run_summary_path=None, fatal_error=None, report_path=None, run_metadata=None)#
Write
run_summary.jsonand a run-marker sentinel to output_root.The marker file is one of
_RUN_SUCCESS,_RUN_PARTIAL, or_RUN_FAILED. Any pre-existing marker files are removed first so only one is present at a time.Run status logic:
success — at least one successful fit or eval, no failures
partial — both successes and failures present
failed — only failures, or fatal_error is set with no successes
- Parameters:
output_root (pathlib.Path) – Directory that receives
run_summary.jsonand the marker file.fit_runs_path (pathlib.Path) – Path to the fit runs JSON inventory (may not exist yet).
eval_runs_path (pathlib.Path) – Path to the eval runs JSON inventory (may not exist yet).
run_summary_path (pathlib.Path | None) – Optional explicit path for the summary JSON. Defaults to
output_root / "run_summary.json".fatal_error (str | None) – If set, the run is marked as at least partially failed.
report_path (pathlib.Path | None) – Path to the generated HTML report, if any.
run_metadata (dict[str, Any] | None) – Extra key/value pairs merged into the summary payload.
- Returns:
The summary payload written to disk.
- Return type:
- coco_pipe.dim_reduction.artifacts.build_availability_record(*, scope, condition, unit_spec, container, requested_components, valid_components)#
Build a data-availability record from container shape information.
Captures matrix dimensions and which n_components values are feasible vs skipped, for inclusion in run-level provenance metadata.
- Parameters:
scope (str)
condition (str)
container (coco_pipe.io.structures.DataContainer)
requested_components (collections.abc.Sequence[int])
valid_components (collections.abc.Sequence[int])
- Return type: