coco_pipe.dim_reduction.pipeline
================================

.. py:module:: coco_pipe.dim_reduction.pipeline

.. autoapi-nested-parse::

   Checkpointed fit/eval pipeline for dimensionality-reduction runs.

   This module contains the core execution primitives that any project using
   coco-pipe's dim-reduction stack can share.  Each function is intentionally
   side-effect-free aside from writing artifact files — the caller controls
   all paths and inventory updates.

   Public API
   ----------
   run_fit
       Fit one reducer variant on one analysis unit, writing a checkpointed
       artifact directory.
   run_eval
       Run one post-hoc evaluation of a saved embedding, writing a checkpointed
       eval artifact directory.
   build_auto_pooled_eval_spec
       Build the automatic ``condition_separation`` eval spec used when pooling
       is active.
   valid_n_components_for_container
       Check whether *n_components* is feasible for a container's matrix shape.
   valid_component_sweep
       Filter a list of component counts to feasible values.
   prepare_eval_inputs
       Align a DataContainer to saved fit ids and resolve eval labels/groups.
   build_fit_request / build_eval_request
       Construct request dictionaries that can be passed to ``run_fit`` and
       ``run_eval``.


Functions
---------

.. autoapisummary::

   coco_pipe.dim_reduction.pipeline.supports_nested_components
   coco_pipe.dim_reduction.pipeline.run_fit
   coco_pipe.dim_reduction.pipeline.run_fit_group
   coco_pipe.dim_reduction.pipeline.run_eval
   coco_pipe.dim_reduction.pipeline.prepare_eval_inputs
   coco_pipe.dim_reduction.pipeline.valid_n_components_for_container
   coco_pipe.dim_reduction.pipeline.valid_component_sweep
   coco_pipe.dim_reduction.pipeline.build_auto_pooled_eval_spec
   coco_pipe.dim_reduction.pipeline.build_fit_request
   coco_pipe.dim_reduction.pipeline.build_eval_request


Module Contents
---------------

.. py:function:: supports_nested_components(method)

   Whether *method* can synthesise its whole sweep from one max-n fit.

   Nested reducers (PCA family, SVD) decompose once at the largest
   ``n_components`` and slice the smaller sweep values out of that single fit;
   everything else (UMAP, t-SNE, PHATE, Isomap, ICA, …) must fit independently
   per dimension. Callers use this both to fit efficiently
   (:func:`run_fit_group`) and to decide the parallel grain: a non-nested
   reducer's sweep is a set of independent fits that can run as separate tasks
   rather than one serial group. Cached because instantiating ``DimReduction``
   only to read ``capabilities`` is wasteful to repeat per request.


.. py:function:: run_fit(fit_payload, container, out_path, output_root, overwrite, *, errors = 'raise')

   Fit one reducer variant and return a fit-runs inventory record.

   If ``_SUCCESS`` already exists in *out_path* and *overwrite* is ``False``
   the existing artifact is loaded and its inventory record is returned
   immediately (checkpoint resume).

   :param fit_payload: Provenance/config dict describing this fit (reducer, n_components,
                       scope, condition, unit info, input signature, …).
   :param container: Data container for this analysis unit.  Must have ``ids``.
   :param out_path: Artifact directory to write (or resume from).
   :param output_root: Root of the entire run output.  Used for relative-path computation in
                       the returned inventory record.
   :param overwrite: When ``True``, an existing *out_path* directory is deleted before
                     fitting.
   :param errors: ``"raise"`` (default) propagates exceptions; ``"record"`` catches them,
                  logs, and returns a failed inventory record of the same shape.

   :returns: A flat inventory record suitable for passing to :func:`update_runs`.
   :rtype: dict


.. py:function:: run_fit_group(requests, *, errors = 'raise')

   Fit a group of requests sharing one analysis unit and reducer.

   All *requests* must describe the same container and reducer, differing only
   in ``n_components`` (as produced by :func:`build_fit_request` for one unit's
   sweep).  When the reducer is hierarchically nested, the largest
   ``n_components`` is fitted once and the smaller sweep values are synthesised
   by slicing the embedding, components, and explained-variance arrays —
   avoiding a redundant decomposition per sweep value.  Non-nested reducers
   (or singleton groups) fall back to an independent :func:`run_fit` per request,
   so behaviour is unchanged for UMAP/t-SNE/ICA/etc.

   Returns one inventory record per request, in the input order's resolution
   (resumed first, then synthesised).


.. py:function:: run_eval(fit_artifact, container, eval_spec, out_path, output_root, overwrite, *, errors = 'raise')

   Run one post-hoc evaluation and return an eval-runs inventory record.

   The fit provenance is read from ``fit_artifact["fit"]``.  If ``_SUCCESS``
   already exists in *out_path* and *overwrite* is ``False`` the existing eval
   artifact is loaded and its inventory record is returned immediately
   (checkpoint resume).

   :param fit_artifact: Full fit artifact dict as returned by :func:`load_fit_artifact`.
   :param container: Data container for the analysis unit (must contain the columns
                     referenced by *eval_spec*).
   :param eval_spec: Eval specification dict with keys ``name``, ``target_col``,
                     ``group_col``, ``filters``, ``label_map``.
   :param out_path: Artifact directory to write (or resume from).
   :param output_root: Root of the entire run output.
   :param overwrite: When ``True``, an existing *out_path* directory is deleted before
                     evaluating.
   :param errors: ``"raise"`` (default) propagates exceptions; ``"record"`` catches them,
                  logs, and returns a failed inventory record.

   :returns: A flat eval inventory record suitable for passing to
             :func:`update_runs`.
   :rtype: dict


.. py:function:: prepare_eval_inputs(container, fit_ids, eval_spec)

   Align *container* observations to *fit_ids* and apply eval filters.

   The saved fit may cover a different (or differently ordered) subset of
   observations than the current container, so this function aligns by
   observation id (with occurrence-count disambiguation for duplicate ids),
   applies column filters from *eval_spec*, resolves labels and groups, and
   masks out missing values.

   :param container: DataContainer that holds the metadata columns referenced by
                     *eval_spec*.
   :param fit_ids: Observation ids in the order they appear in the saved embedding.
   :param eval_spec: Eval specification dict (``name``, ``target_col``, ``group_col``,
                     ``filters``, ``label_map``).

   :returns: *selected_index* is the pandas :class:`~pandas.Index` into the
             aligned frame (suitable for slicing the embedding array).
             The remaining three are numpy arrays of strings.
   :rtype: tuple of (selected_index, selected_ids, labels, groups)

   :raises ValueError: On missing columns or structural issues.
   :raises RuntimeError: When no valid samples remain after alignment and filtering.


.. py:function:: valid_n_components_for_container(container, n_components)

   Return ``True`` if *n_components* is feasible for *container*'s matrix.


.. py:function:: valid_component_sweep(container, requested)

   Filter *requested* to the component counts feasible for *container*.

   Logs a message if any values are skipped.


.. py:function:: build_auto_pooled_eval_spec(conditions, run_pooled)

   Return a ``condition_separation`` eval spec, or ``None``.

   The spec is only produced when *run_pooled* is ``True`` and at least two
   conditions are present — otherwise condition separation is not meaningful.

   :param conditions: List of condition names that will be included in the pooled container.
   :param run_pooled: Whether the caller intends to run a pooled analysis.


.. py:function:: build_fit_request(*, container, scope, condition, unit_spec, reducer, n_components, input_signature, output_root, overwrite = False, subject_col = 'subject', extra_payload = None, artifact_path = None, artifact_path_factory = None)

   Build a request dictionary suitable for passing to :func:`run_fit`.

   The caller owns project-specific input provenance via *input_signature* and
   optional *extra_payload*. coco-pipe owns the deterministic fit id, standard
   fit payload fields, and default flat artifact path.


.. py:function:: build_eval_request(*, fit_record, eval_spec, container, output_root, overwrite = False, fit_artifact = None, artifact_path = None, artifact_path_factory = None)

   Build a request dictionary suitable for passing to :func:`run_eval`.

   By default, the fit artifact is loaded from ``fit_record['artifact_path']``
   relative to *output_root* and the eval artifact is placed under the flat
   ``artifacts/evals`` directory.