coco_pipe.dim_reduction.pipeline ================================ .. py:module:: coco_pipe.dim_reduction.pipeline .. autoapi-nested-parse:: Checkpointed fit/eval pipeline for dimensionality-reduction runs. This module contains the core execution primitives that any project using coco-pipe's dim-reduction stack can share. Each function is intentionally side-effect-free aside from writing artifact files — the caller controls all paths and inventory updates. Public API ---------- run_fit Fit one reducer variant on one analysis unit, writing a checkpointed artifact directory. run_eval Run one post-hoc evaluation of a saved embedding, writing a checkpointed eval artifact directory. build_auto_pooled_eval_spec Build the automatic ``condition_separation`` eval spec used when pooling is active. valid_n_components_for_container Check whether *n_components* is feasible for a container's matrix shape. valid_component_sweep Filter a list of component counts to feasible values. prepare_eval_inputs Align a DataContainer to saved fit ids and resolve eval labels/groups. build_fit_request / build_eval_request Construct request dictionaries that can be passed to ``run_fit`` and ``run_eval``. Functions --------- .. autoapisummary:: coco_pipe.dim_reduction.pipeline.supports_nested_components coco_pipe.dim_reduction.pipeline.run_fit coco_pipe.dim_reduction.pipeline.run_fit_group coco_pipe.dim_reduction.pipeline.run_eval coco_pipe.dim_reduction.pipeline.prepare_eval_inputs coco_pipe.dim_reduction.pipeline.valid_n_components_for_container coco_pipe.dim_reduction.pipeline.valid_component_sweep coco_pipe.dim_reduction.pipeline.build_auto_pooled_eval_spec coco_pipe.dim_reduction.pipeline.build_fit_request coco_pipe.dim_reduction.pipeline.build_eval_request Module Contents --------------- .. py:function:: supports_nested_components(method) Whether *method* can synthesise its whole sweep from one max-n fit. Nested reducers (PCA family, SVD) decompose once at the largest ``n_components`` and slice the smaller sweep values out of that single fit; everything else (UMAP, t-SNE, PHATE, Isomap, ICA, …) must fit independently per dimension. Callers use this both to fit efficiently (:func:`run_fit_group`) and to decide the parallel grain: a non-nested reducer's sweep is a set of independent fits that can run as separate tasks rather than one serial group. Cached because instantiating ``DimReduction`` only to read ``capabilities`` is wasteful to repeat per request. .. py:function:: run_fit(fit_payload, container, out_path, output_root, overwrite, *, errors = 'raise') Fit one reducer variant and return a fit-runs inventory record. If ``_SUCCESS`` already exists in *out_path* and *overwrite* is ``False`` the existing artifact is loaded and its inventory record is returned immediately (checkpoint resume). :param fit_payload: Provenance/config dict describing this fit (reducer, n_components, scope, condition, unit info, input signature, …). :param container: Data container for this analysis unit. Must have ``ids``. :param out_path: Artifact directory to write (or resume from). :param output_root: Root of the entire run output. Used for relative-path computation in the returned inventory record. :param overwrite: When ``True``, an existing *out_path* directory is deleted before fitting. :param errors: ``"raise"`` (default) propagates exceptions; ``"record"`` catches them, logs, and returns a failed inventory record of the same shape. :returns: A flat inventory record suitable for passing to :func:`update_runs`. :rtype: dict .. py:function:: run_fit_group(requests, *, errors = 'raise') Fit a group of requests sharing one analysis unit and reducer. All *requests* must describe the same container and reducer, differing only in ``n_components`` (as produced by :func:`build_fit_request` for one unit's sweep). When the reducer is hierarchically nested, the largest ``n_components`` is fitted once and the smaller sweep values are synthesised by slicing the embedding, components, and explained-variance arrays — avoiding a redundant decomposition per sweep value. Non-nested reducers (or singleton groups) fall back to an independent :func:`run_fit` per request, so behaviour is unchanged for UMAP/t-SNE/ICA/etc. Returns one inventory record per request, in the input order's resolution (resumed first, then synthesised). .. py:function:: run_eval(fit_artifact, container, eval_spec, out_path, output_root, overwrite, *, errors = 'raise') Run one post-hoc evaluation and return an eval-runs inventory record. The fit provenance is read from ``fit_artifact["fit"]``. If ``_SUCCESS`` already exists in *out_path* and *overwrite* is ``False`` the existing eval artifact is loaded and its inventory record is returned immediately (checkpoint resume). :param fit_artifact: Full fit artifact dict as returned by :func:`load_fit_artifact`. :param container: Data container for the analysis unit (must contain the columns referenced by *eval_spec*). :param eval_spec: Eval specification dict with keys ``name``, ``target_col``, ``group_col``, ``filters``, ``label_map``. :param out_path: Artifact directory to write (or resume from). :param output_root: Root of the entire run output. :param overwrite: When ``True``, an existing *out_path* directory is deleted before evaluating. :param errors: ``"raise"`` (default) propagates exceptions; ``"record"`` catches them, logs, and returns a failed inventory record. :returns: A flat eval inventory record suitable for passing to :func:`update_runs`. :rtype: dict .. py:function:: prepare_eval_inputs(container, fit_ids, eval_spec) Align *container* observations to *fit_ids* and apply eval filters. The saved fit may cover a different (or differently ordered) subset of observations than the current container, so this function aligns by observation id (with occurrence-count disambiguation for duplicate ids), applies column filters from *eval_spec*, resolves labels and groups, and masks out missing values. :param container: DataContainer that holds the metadata columns referenced by *eval_spec*. :param fit_ids: Observation ids in the order they appear in the saved embedding. :param eval_spec: Eval specification dict (``name``, ``target_col``, ``group_col``, ``filters``, ``label_map``). :returns: *selected_index* is the pandas :class:`~pandas.Index` into the aligned frame (suitable for slicing the embedding array). The remaining three are numpy arrays of strings. :rtype: tuple of (selected_index, selected_ids, labels, groups) :raises ValueError: On missing columns or structural issues. :raises RuntimeError: When no valid samples remain after alignment and filtering. .. py:function:: valid_n_components_for_container(container, n_components) Return ``True`` if *n_components* is feasible for *container*'s matrix. .. py:function:: valid_component_sweep(container, requested) Filter *requested* to the component counts feasible for *container*. Logs a message if any values are skipped. .. py:function:: build_auto_pooled_eval_spec(conditions, run_pooled) Return a ``condition_separation`` eval spec, or ``None``. The spec is only produced when *run_pooled* is ``True`` and at least two conditions are present — otherwise condition separation is not meaningful. :param conditions: List of condition names that will be included in the pooled container. :param run_pooled: Whether the caller intends to run a pooled analysis. .. py:function:: build_fit_request(*, container, scope, condition, unit_spec, reducer, n_components, input_signature, output_root, overwrite = False, subject_col = 'subject', extra_payload = None, artifact_path = None, artifact_path_factory = None) Build a request dictionary suitable for passing to :func:`run_fit`. The caller owns project-specific input provenance via *input_signature* and optional *extra_payload*. coco-pipe owns the deterministic fit id, standard fit payload fields, and default flat artifact path. .. py:function:: build_eval_request(*, fit_record, eval_spec, container, output_root, overwrite = False, fit_artifact = None, artifact_path = None, artifact_path_factory = None) Build a request dictionary suitable for passing to :func:`run_eval`. By default, the fit artifact is loaded from ``fit_record['artifact_path']`` relative to *output_root* and the eval artifact is placed under the flat ``artifacts/evals`` directory.