coco_pipe.dim_reduction.pipeline#
Checkpointed fit/eval pipeline for dimensionality-reduction runs.
This module contains the core execution primitives that any project using coco-pipe’s dim-reduction stack can share. Each function is intentionally side-effect-free aside from writing artifact files — the caller controls all paths and inventory updates.
Public API#
- run_fit
Fit one reducer variant on one analysis unit, writing a checkpointed artifact directory.
- run_eval
Run one post-hoc evaluation of a saved embedding, writing a checkpointed eval artifact directory.
- build_auto_pooled_eval_spec
Build the automatic
condition_separationeval spec used when pooling is active.- valid_n_components_for_container
Check whether n_components is feasible for a container’s matrix shape.
- valid_component_sweep
Filter a list of component counts to feasible values.
- prepare_eval_inputs
Align a DataContainer to saved fit ids and resolve eval labels/groups.
- build_fit_request / build_eval_request
Construct request dictionaries that can be passed to
run_fitandrun_eval.
Functions#
|
Whether method can synthesise its whole sweep from one max-n fit. |
|
Fit one reducer variant and return a fit-runs inventory record. |
|
Fit a group of requests sharing one analysis unit and reducer. |
|
Run one post-hoc evaluation and return an eval-runs inventory record. |
|
Align container observations to fit_ids and apply eval filters. |
|
Return |
|
Filter requested to the component counts feasible for container. |
|
Return a |
|
Build a request dictionary suitable for passing to |
|
Build a request dictionary suitable for passing to |
Module Contents#
- coco_pipe.dim_reduction.pipeline.supports_nested_components(method)#
Whether method can synthesise its whole sweep from one max-n fit.
Nested reducers (PCA family, SVD) decompose once at the largest
n_componentsand slice the smaller sweep values out of that single fit; everything else (UMAP, t-SNE, PHATE, Isomap, ICA, …) must fit independently per dimension. Callers use this both to fit efficiently (run_fit_group()) and to decide the parallel grain: a non-nested reducer’s sweep is a set of independent fits that can run as separate tasks rather than one serial group. Cached because instantiatingDimReductiononly to readcapabilitiesis wasteful to repeat per request.
- coco_pipe.dim_reduction.pipeline.run_fit(fit_payload, container, out_path, output_root, overwrite, *, errors='raise')#
Fit one reducer variant and return a fit-runs inventory record.
If
_SUCCESSalready exists in out_path and overwrite isFalsethe existing artifact is loaded and its inventory record is returned immediately (checkpoint resume).- Parameters:
fit_payload (dict[str, Any]) – Provenance/config dict describing this fit (reducer, n_components, scope, condition, unit info, input signature, …).
container (coco_pipe.io.DataContainer) – Data container for this analysis unit. Must have
ids.out_path (pathlib.Path) – Artifact directory to write (or resume from).
output_root (pathlib.Path) – Root of the entire run output. Used for relative-path computation in the returned inventory record.
overwrite (bool) – When
True, an existing out_path directory is deleted before fitting.errors (coco_pipe.dim_reduction._constants.ErrorMode) –
"raise"(default) propagates exceptions;"record"catches them, logs, and returns a failed inventory record of the same shape.
- Returns:
A flat inventory record suitable for passing to
update_runs().- Return type:
- coco_pipe.dim_reduction.pipeline.run_fit_group(requests, *, errors='raise')#
Fit a group of requests sharing one analysis unit and reducer.
All requests must describe the same container and reducer, differing only in
n_components(as produced bybuild_fit_request()for one unit’s sweep). When the reducer is hierarchically nested, the largestn_componentsis fitted once and the smaller sweep values are synthesised by slicing the embedding, components, and explained-variance arrays — avoiding a redundant decomposition per sweep value. Non-nested reducers (or singleton groups) fall back to an independentrun_fit()per request, so behaviour is unchanged for UMAP/t-SNE/ICA/etc.Returns one inventory record per request, in the input order’s resolution (resumed first, then synthesised).
- coco_pipe.dim_reduction.pipeline.run_eval(fit_artifact, container, eval_spec, out_path, output_root, overwrite, *, errors='raise')#
Run one post-hoc evaluation and return an eval-runs inventory record.
The fit provenance is read from
fit_artifact["fit"]. If_SUCCESSalready exists in out_path and overwrite isFalsethe existing eval artifact is loaded and its inventory record is returned immediately (checkpoint resume).- Parameters:
fit_artifact (dict[str, Any]) – Full fit artifact dict as returned by
load_fit_artifact().container (coco_pipe.io.DataContainer) – Data container for the analysis unit (must contain the columns referenced by eval_spec).
eval_spec (dict[str, Any]) – Eval specification dict with keys
name,target_col,group_col,filters,label_map.out_path (pathlib.Path) – Artifact directory to write (or resume from).
output_root (pathlib.Path) – Root of the entire run output.
overwrite (bool) – When
True, an existing out_path directory is deleted before evaluating.errors (coco_pipe.dim_reduction._constants.ErrorMode) –
"raise"(default) propagates exceptions;"record"catches them, logs, and returns a failed inventory record.
- Returns:
A flat eval inventory record suitable for passing to
update_runs().- Return type:
- coco_pipe.dim_reduction.pipeline.prepare_eval_inputs(container, fit_ids, eval_spec)#
Align container observations to fit_ids and apply eval filters.
The saved fit may cover a different (or differently ordered) subset of observations than the current container, so this function aligns by observation id (with occurrence-count disambiguation for duplicate ids), applies column filters from eval_spec, resolves labels and groups, and masks out missing values.
- Parameters:
container (coco_pipe.io.DataContainer) – DataContainer that holds the metadata columns referenced by eval_spec.
fit_ids (numpy.ndarray) – Observation ids in the order they appear in the saved embedding.
eval_spec (dict[str, Any]) – Eval specification dict (
name,target_col,group_col,filters,label_map).
- Returns:
selected_index is the pandas
Indexinto the aligned frame (suitable for slicing the embedding array). The remaining three are numpy arrays of strings.- Return type:
- Raises:
ValueError – On missing columns or structural issues.
RuntimeError – When no valid samples remain after alignment and filtering.
- coco_pipe.dim_reduction.pipeline.valid_n_components_for_container(container, n_components)#
Return
Trueif n_components is feasible for container’s matrix.- Parameters:
container (coco_pipe.io.DataContainer)
n_components (int)
- Return type:
- coco_pipe.dim_reduction.pipeline.valid_component_sweep(container, requested)#
Filter requested to the component counts feasible for container.
Logs a message if any values are skipped.
- Parameters:
container (coco_pipe.io.DataContainer)
requested (collections.abc.Sequence[int])
- Return type:
- coco_pipe.dim_reduction.pipeline.build_auto_pooled_eval_spec(conditions, run_pooled)#
Return a
condition_separationeval spec, orNone.The spec is only produced when run_pooled is
Trueand at least two conditions are present — otherwise condition separation is not meaningful.
- coco_pipe.dim_reduction.pipeline.build_fit_request(*, container, scope, condition, unit_spec, reducer, n_components, input_signature, output_root, overwrite=False, subject_col='subject', extra_payload=None, artifact_path=None, artifact_path_factory=None)#
Build a request dictionary suitable for passing to
run_fit().The caller owns project-specific input provenance via input_signature and optional extra_payload. coco-pipe owns the deterministic fit id, standard fit payload fields, and default flat artifact path.
- Parameters:
container (coco_pipe.io.DataContainer)
scope (str)
condition (str)
reducer (str)
n_components (int)
output_root (pathlib.Path)
overwrite (bool)
subject_col (str)
artifact_path (pathlib.Path | None)
artifact_path_factory (collections.abc.Callable[[dict[str, Any], pathlib.Path], pathlib.Path] | None)
- Return type:
- coco_pipe.dim_reduction.pipeline.build_eval_request(*, fit_record, eval_spec, container, output_root, overwrite=False, fit_artifact=None, artifact_path=None, artifact_path_factory=None)#
Build a request dictionary suitable for passing to
run_eval().By default, the fit artifact is loaded from
fit_record['artifact_path']relative to output_root and the eval artifact is placed under the flatartifacts/evalsdirectory.- Parameters:
container (coco_pipe.io.DataContainer)
output_root (pathlib.Path)
overwrite (bool)
artifact_path (pathlib.Path | None)
artifact_path_factory (collections.abc.Callable[[dict[str, Any], dict[str, Any], pathlib.Path], pathlib.Path] | None)
- Return type: