coco_pipe.dim_reduction.artifacts
=================================

.. py:module:: coco_pipe.dim_reduction.artifacts

.. autoapi-nested-parse::

   Artifact persistence helpers for dimensionality-reduction runs.

   This module owns the save/load/update layer that the dim-reduction pipeline
   uses to checkpoint fit results and post-hoc eval results to disk.  Keeping
   these functions in coco-pipe (rather than in a consumer project) means any
   project that runs the dim-reduction pipeline can share the same artifact
   format without duplicating the serialization logic.

   Public API
   ----------
   save_fit_artifact
       Write embedding, ids, fit payload, metrics, and diagnostics to a
       directory and stamp it with ``_SUCCESS``.
   save_eval_artifact
       Write an eval payload to a directory and stamp it with ``_SUCCESS``.
   load_fit_artifact
       Read a fit artifact directory back into a dict.
   load_fit_runs
       Read a JSON runs inventory file back into a list of dicts.
   update_runs
       Upsert a record into a JSON runs inventory, keeping it sorted.

   Constants
   ---------
   SEPARATION_METRIC_KEY
       Canonical key for the logistic-regression separation metric returned by
       :func:`coco_pipe.dim_reduction.evaluation.core.evaluate_embedding`.
   FIT_METRIC_COLUMNS
       Ordered list of geometry quality metric names written by fit artifacts.
   EVAL_METRIC_COLUMNS
       Ordered list of eval metric names written by eval artifacts.
   FIT_RUN_KEY_FIELDS
       Fields that uniquely identify a fit run in the runs inventory.
   EVAL_RUN_KEY_FIELDS
       Fields that uniquely identify an eval run in the runs inventory.


Functions
---------

.. autoapisummary::

   coco_pipe.dim_reduction.artifacts.save_fit_artifact
   coco_pipe.dim_reduction.artifacts.save_eval_artifact
   coco_pipe.dim_reduction.artifacts.load_fit_artifact
   coco_pipe.dim_reduction.artifacts.load_fit_runs
   coco_pipe.dim_reduction.artifacts.update_runs
   coco_pipe.dim_reduction.artifacts.build_record
   coco_pipe.dim_reduction.artifacts.write_run_status
   coco_pipe.dim_reduction.artifacts.build_availability_record


Module Contents
---------------

.. py:function:: save_fit_artifact(path, embedding, ids, fit_payload, metrics_payload, diagnostics)

   Write a fit artifact to *path* and stamp it with ``_SUCCESS``.

   The artifact directory receives three files (a compact layout that keeps the
   inode count low for high-cardinality analysis modes):

   - ``fit.npz`` — embedding, ids, and reducer diagnostics in one archive
   - ``fit.json`` — the fit payload and geometry quality metrics
   - ``_SUCCESS`` — sentinel that marks the artifact as complete

   :func:`load_fit_artifact` also reads the older seven-file layout, so existing
   artifacts remain loadable.

   :param path: Target directory.  Created (including parents) if it does not exist.
   :param embedding: Array of shape ``(n_obs, n_dims)``.
   :param ids: 1-D array of observation identifiers aligned with *embedding*.
   :param fit_payload: Serialisable dict describing the fit (reducer, config, provenance, …).
                       The key ``"artifact_stem"`` is recorded for provenance.
   :param metrics_payload: Serialisable dict of geometry quality metrics (trustworthiness, …).
   :param diagnostics: Serialisable dict of reducer-internal diagnostics.  May be empty.


.. py:function:: save_eval_artifact(path, eval_payload)

   Write an eval artifact to *path* and stamp it with ``_SUCCESS``.

   The artifact directory receives two files:

   - ``eval.json`` — the full eval payload
   - ``_SUCCESS`` — sentinel that marks the artifact as complete

   :func:`_load_eval_payload` also reads the older ``<stem>_eval.json`` layout.

   :param path: Target directory.  Created (including parents) if it does not exist.
   :param eval_payload: Serialisable dict describing the post-hoc evaluation results.  The key
                        ``"artifact_stem"`` is recorded for provenance when present.


.. py:function:: load_fit_artifact(path)

   Load a fit artifact directory into a dict.

   The compact ``fit.npz`` + ``fit.json`` layout written by
   :func:`save_fit_artifact` is read first; if absent, the legacy seven-file
   layout is resolved via ``artifact_manifest.json`` then a glob fallback so
   artifacts written by earlier versions remain loadable.

   :param path: Artifact directory written by :func:`save_fit_artifact`.

   :returns: ``embedding``, ``ids``, ``fit``, ``metrics``, ``diagnostics``,
             ``manifest``, ``path``, and ``embedding_container``. The
             ``embedding_container`` is a :class:`~coco_pipe.io.structures.DataContainer`
             view of the embedding (or ``None`` for non-2D embeddings); the
             ``embedding`` and ``ids`` arrays remain for direct array access.
   :rtype: dict with keys


.. py:function:: load_fit_runs(path)

   Load a fit-runs inventory JSON file into a list of dicts.

   :param path: Path to the JSON file written by :func:`update_runs`.

   :rtype: list of dict

   :raises RuntimeError: If *path* does not exist.
   :raises ValueError: If the file does not contain a JSON list.


.. py:function:: update_runs(path, record, key_fields)

   Upsert *record* into a JSON runs inventory at *path*.

   If *path* already exists, the list is loaded and the record whose
   ``key_fields`` values match *record*'s is replaced (upsert semantics).
   If no match is found, *record* is appended.  The list is then sorted by
   ``key_fields`` followed by common run-taxonomy fields and written back.

   :param path: Target JSON file.  The parent directory is created if necessary.
   :param record: Dict to upsert.  Must contain all fields listed in *key_fields*.
   :param key_fields: Ordered sequence of field names that uniquely identify a run.


.. py:function:: build_record(payload, artifact_path, output_root, metric_columns, metrics_payload = None, error = None)

   Build a flat run-inventory record from an artifact payload dict.

   Used for both fit and eval records — pass :data:`FIT_METRIC_COLUMNS` or
   :data:`EVAL_METRIC_COLUMNS` as *metric_columns*.

   Strips bulky sub-dicts (``metrics``, ``records``, ``metadata``,
   ``artifacts``) from *payload*, adds a relative ``artifact_path``,
   promotes each metric in *metric_columns* to a top-level float (or
   ``nan``), and stamps the record with a ``status`` of ``"success"`` or
   ``"failed"``.


.. py:function:: write_run_status(output_root, fit_runs_path, eval_runs_path, *, run_summary_path = None, fatal_error = None, report_path = None, run_metadata = None)

   Write ``run_summary.json`` and a run-marker sentinel to *output_root*.

   The marker file is one of ``_RUN_SUCCESS``, ``_RUN_PARTIAL``, or
   ``_RUN_FAILED``.  Any pre-existing marker files are removed first so only
   one is present at a time.

   Run status logic:

   - *success*  — at least one successful fit or eval, no failures
   - *partial*  — both successes and failures present
   - *failed*   — only failures, or *fatal_error* is set with no successes

   :param output_root: Directory that receives ``run_summary.json`` and the marker file.
   :param fit_runs_path: Path to the fit runs JSON inventory (may not exist yet).
   :param eval_runs_path: Path to the eval runs JSON inventory (may not exist yet).
   :param run_summary_path: Optional explicit path for the summary JSON. Defaults to
                            ``output_root / "run_summary.json"``.
   :param fatal_error: If set, the run is marked as at least partially failed.
   :param report_path: Path to the generated HTML report, if any.
   :param run_metadata: Extra key/value pairs merged into the summary payload.

   :returns: The summary payload written to disk.
   :rtype: dict


.. py:function:: build_availability_record(*, scope, condition, unit_spec, container, requested_components, valid_components)

   Build a data-availability record from container shape information.

   Captures matrix dimensions and which n_components values are feasible vs
   skipped, for inclusion in run-level provenance metadata.