coco_pipe.dim_reduction.artifacts ================================= .. py:module:: coco_pipe.dim_reduction.artifacts .. autoapi-nested-parse:: Artifact persistence helpers for dimensionality-reduction runs. This module owns the save/load/update layer that the dim-reduction pipeline uses to checkpoint fit results and post-hoc eval results to disk. Keeping these functions in coco-pipe (rather than in a consumer project) means any project that runs the dim-reduction pipeline can share the same artifact format without duplicating the serialization logic. Public API ---------- save_fit_artifact Write embedding, ids, fit payload, metrics, and diagnostics to a directory and stamp it with ``_SUCCESS``. save_eval_artifact Write an eval payload to a directory and stamp it with ``_SUCCESS``. load_fit_artifact Read a fit artifact directory back into a dict. load_fit_runs Read a JSON runs inventory file back into a list of dicts. update_runs Upsert a record into a JSON runs inventory, keeping it sorted. Constants --------- SEPARATION_METRIC_KEY Canonical key for the logistic-regression separation metric returned by :func:`coco_pipe.dim_reduction.evaluation.core.evaluate_embedding`. FIT_METRIC_COLUMNS Ordered list of geometry quality metric names written by fit artifacts. EVAL_METRIC_COLUMNS Ordered list of eval metric names written by eval artifacts. FIT_RUN_KEY_FIELDS Fields that uniquely identify a fit run in the runs inventory. EVAL_RUN_KEY_FIELDS Fields that uniquely identify an eval run in the runs inventory. Functions --------- .. autoapisummary:: coco_pipe.dim_reduction.artifacts.save_fit_artifact coco_pipe.dim_reduction.artifacts.save_eval_artifact coco_pipe.dim_reduction.artifacts.load_fit_artifact coco_pipe.dim_reduction.artifacts.load_fit_runs coco_pipe.dim_reduction.artifacts.update_runs coco_pipe.dim_reduction.artifacts.build_record coco_pipe.dim_reduction.artifacts.write_run_status coco_pipe.dim_reduction.artifacts.build_availability_record Module Contents --------------- .. py:function:: save_fit_artifact(path, embedding, ids, fit_payload, metrics_payload, diagnostics) Write a fit artifact to *path* and stamp it with ``_SUCCESS``. The artifact directory receives three files (a compact layout that keeps the inode count low for high-cardinality analysis modes): - ``fit.npz`` — embedding, ids, and reducer diagnostics in one archive - ``fit.json`` — the fit payload and geometry quality metrics - ``_SUCCESS`` — sentinel that marks the artifact as complete :func:`load_fit_artifact` also reads the older seven-file layout, so existing artifacts remain loadable. :param path: Target directory. Created (including parents) if it does not exist. :param embedding: Array of shape ``(n_obs, n_dims)``. :param ids: 1-D array of observation identifiers aligned with *embedding*. :param fit_payload: Serialisable dict describing the fit (reducer, config, provenance, …). The key ``"artifact_stem"`` is recorded for provenance. :param metrics_payload: Serialisable dict of geometry quality metrics (trustworthiness, …). :param diagnostics: Serialisable dict of reducer-internal diagnostics. May be empty. .. py:function:: save_eval_artifact(path, eval_payload) Write an eval artifact to *path* and stamp it with ``_SUCCESS``. The artifact directory receives two files: - ``eval.json`` — the full eval payload - ``_SUCCESS`` — sentinel that marks the artifact as complete :func:`_load_eval_payload` also reads the older ``_eval.json`` layout. :param path: Target directory. Created (including parents) if it does not exist. :param eval_payload: Serialisable dict describing the post-hoc evaluation results. The key ``"artifact_stem"`` is recorded for provenance when present. .. py:function:: load_fit_artifact(path) Load a fit artifact directory into a dict. The compact ``fit.npz`` + ``fit.json`` layout written by :func:`save_fit_artifact` is read first; if absent, the legacy seven-file layout is resolved via ``artifact_manifest.json`` then a glob fallback so artifacts written by earlier versions remain loadable. :param path: Artifact directory written by :func:`save_fit_artifact`. :returns: ``embedding``, ``ids``, ``fit``, ``metrics``, ``diagnostics``, ``manifest``, ``path``, and ``embedding_container``. The ``embedding_container`` is a :class:`~coco_pipe.io.structures.DataContainer` view of the embedding (or ``None`` for non-2D embeddings); the ``embedding`` and ``ids`` arrays remain for direct array access. :rtype: dict with keys .. py:function:: load_fit_runs(path) Load a fit-runs inventory JSON file into a list of dicts. :param path: Path to the JSON file written by :func:`update_runs`. :rtype: list of dict :raises RuntimeError: If *path* does not exist. :raises ValueError: If the file does not contain a JSON list. .. py:function:: update_runs(path, record, key_fields) Upsert *record* into a JSON runs inventory at *path*. If *path* already exists, the list is loaded and the record whose ``key_fields`` values match *record*'s is replaced (upsert semantics). If no match is found, *record* is appended. The list is then sorted by ``key_fields`` followed by common run-taxonomy fields and written back. :param path: Target JSON file. The parent directory is created if necessary. :param record: Dict to upsert. Must contain all fields listed in *key_fields*. :param key_fields: Ordered sequence of field names that uniquely identify a run. .. py:function:: build_record(payload, artifact_path, output_root, metric_columns, metrics_payload = None, error = None) Build a flat run-inventory record from an artifact payload dict. Used for both fit and eval records — pass :data:`FIT_METRIC_COLUMNS` or :data:`EVAL_METRIC_COLUMNS` as *metric_columns*. Strips bulky sub-dicts (``metrics``, ``records``, ``metadata``, ``artifacts``) from *payload*, adds a relative ``artifact_path``, promotes each metric in *metric_columns* to a top-level float (or ``nan``), and stamps the record with a ``status`` of ``"success"`` or ``"failed"``. .. py:function:: write_run_status(output_root, fit_runs_path, eval_runs_path, *, run_summary_path = None, fatal_error = None, report_path = None, run_metadata = None) Write ``run_summary.json`` and a run-marker sentinel to *output_root*. The marker file is one of ``_RUN_SUCCESS``, ``_RUN_PARTIAL``, or ``_RUN_FAILED``. Any pre-existing marker files are removed first so only one is present at a time. Run status logic: - *success* — at least one successful fit or eval, no failures - *partial* — both successes and failures present - *failed* — only failures, or *fatal_error* is set with no successes :param output_root: Directory that receives ``run_summary.json`` and the marker file. :param fit_runs_path: Path to the fit runs JSON inventory (may not exist yet). :param eval_runs_path: Path to the eval runs JSON inventory (may not exist yet). :param run_summary_path: Optional explicit path for the summary JSON. Defaults to ``output_root / "run_summary.json"``. :param fatal_error: If set, the run is marked as at least partially failed. :param report_path: Path to the generated HTML report, if any. :param run_metadata: Extra key/value pairs merged into the summary payload. :returns: The summary payload written to disk. :rtype: dict .. py:function:: build_availability_record(*, scope, condition, unit_spec, container, requested_components, valid_components) Build a data-availability record from container shape information. Captures matrix dimensions and which n_components values are feasible vs skipped, for inclusion in run-level provenance metadata.