Core Workflow and Configuration#

`DimReduction` Manager#

DimReduction is the single public manager class. It wraps one reducer, drives fit / transform, delegates scoring to the pure evaluator, runs interpretation analyses, and exposes a consolidated get_summary() payload.

The manager does not cache embeddings. Every method that needs an embedding takes it as an explicit argument.

—

1. Construction#

from coco_pipe.dim_reduction import DimReduction
from coco_pipe.dim_reduction.config import UMAPConfig

# By method name
reducer = DimReduction("UMAP", n_components=2, n_neighbors=15)

# With a typed config
reducer = DimReduction(UMAPConfig(n_components=2, n_neighbors=15))

Resolution order for reducer kwargs (later overrides earlier):

Fields from the config (if a BaseReducerConfig is passed).
The params= dictionary.
Additional **kwargs.

—

2. Lifecycle Methods#

reducer.fit(X)                        # fit only
embedding = reducer.transform(X)      # transform an already-fitted reducer
embedding = reducer.fit_transform(X)  # combined

All three return self / a NumPy array as appropriate. They reset cached metrics_, diagnostics_, metric_records_, and interpretation_ so stale evaluations cannot leak across runs.

DimReduction.get_components() returns reducer components for linear methods (PCA, IncrementalPCA, DaskPCA, DaskTruncatedSVD). It raises for methods that do not expose components.

—

3. Scoring#

scores = reducer.score(
    embedding,                        # required, explicit
    X=X,                              # required for 2D metrics
    n_neighbors=5,                    # single-score neighborhood
    metrics=["trustworthiness", "continuity"],
    k_values=[5, 10, 20],             # multi-scale sweep
    labels=labels,                    # used for trajectory_separation
    groups=groups,                    # for supervised separation metrics
    times=times,                      # for trajectory AUC
    separation_method="centroid",
)

Returns a dict with three keys:

`metrics`	Scalar metric summary (also cached on `metrics_`).
`metadata`	Scalar descriptive metadata (also cached on `quality_metadata_`).
`diagnostics`	Array / structured diagnostics (e.g., `shepard_distances`, `coranking_matrix`, trajectory timecourses) — also cached on `diagnostics_`.

Tidy long-form records are cached on metric_records_ for downstream ranking and reporting.

2D vs. 3D shape routing

The evaluator chooses standard or trajectory metrics from embedding.shape, not from the reducer name. Pass a (n_samples, n_components) embedding for standard metrics; pass a (n_trajectories, n_times, n_dims) tensor for trajectory metrics. See Trajectory Analysis.

—

4. Interpretation#

result = reducer.interpret(
    X,
    X_emb=embedding,
    analyses=["correlation", "perturbation", "gradient"],
    feature_names=feature_names,
    n_repeats=5,
    random_state=42,
)

Returns {"analysis": ..., "records": [...]}. The analysis payload is keyed by analysis name; records is tidy long-form ready for plotting and reports. Both are cached on interpretation_ and interpretation_records_.

Supported analyses:

"correlation" — Spearman correlations between input features and embedding axes. Works for any reducer.
"perturbation" — model-agnostic feature importance from per-feature shuffling. Requires a fitted reducer with transform.
"gradient" — encoder saliency for supported torch-based reducers (IVIS, ParametricUMAP, TopologicalAE).

See Feature Interpretation for the math and reducer requirements.

—

5. Inspecting Cached State#

reducer.get_metrics()           # scalar metrics_
reducer.get_quality_metadata()  # metadata from reducer + evaluator
reducer.get_diagnostics()       # full diagnostics_ payload
reducer.get_summary()           # combined: metrics + metadata + diagnostics
                                # + metric_records + interpretation + capabilities

get_summary() is the canonical input for coco_pipe.report.Report.add_reduction() and is JSON-serializable. It deliberately does not carry an embedding — pass embeddings explicitly to plotting and reporting paths that need them.

—

6. Capabilities#

caps = reducer.capabilities
# {'is_linear': True, 'has_components': True, 'has_loss_history': False, ...}

The manager exposes the reducer’s capability dict directly. Capabilities are used by:

the evaluator to skip metrics the reducer cannot support,
coco_pipe.viz.dim_reduction.plot_loss_history() to detect available loss curves,
MethodSelector for capability-aware filtering in comparison tables.

—

7. Persistence#

reducer.save("models/umap.pkl")
loaded = DimReduction.load("models/umap.pkl", method="UMAP")

save writes the fitted reducer with its kwargs and method name; load re-instantiates the manager and restores the reducer’s fitted state. Cached evaluation payloads are not persisted — re-run score() on the embedding you have.

—

8. End-to-End Skeleton#

from coco_pipe.dim_reduction import DimReduction
from coco_pipe.viz import plot_embedding, plot_metrics

reducer = DimReduction("UMAP", n_components=2, n_neighbors=15, random_state=42)
embedding = reducer.fit_transform(X)
reducer.score(embedding, X=X, k_values=[5, 10, 20])
reducer.interpret(X, X_emb=embedding, analyses=["correlation"],
                  feature_names=feature_names)
summary = reducer.get_summary()

plot_embedding(embedding, labels=class_ids)
plot_metrics(reducer)                    # accepts the manager directly

For a multi-reducer comparison, see MethodSelector in Evaluation and Interpretation.

Configuration Reference#

Every reducer in coco_pipe.dim_reduction accepts either keyword arguments to DimReduction or a typed pydantic config. Configs validate field names, types, and ranges at parse time, so typos and incompatible options fail before any data is touched.

—

1. The Two Equivalent Construction Styles#

from coco_pipe.dim_reduction import DimReduction
from coco_pipe.dim_reduction.config import UMAPConfig

# Keyword-style (string method name + kwargs)
reducer = DimReduction("UMAP", n_components=2, n_neighbors=15, min_dist=0.1)

# Config-style (typed pydantic model)
reducer = DimReduction(UMAPConfig(n_components=2, n_neighbors=15, min_dist=0.1))

Prefer the config style when:

Reducer parameters come from a YAML or JSON file (pass via **config).
You want a single object that can be serialized, logged, or reused across experiments.
You want strict validation of every field at construction time.

Use the keyword style for one-off scripts and exploration.

—

2. Base Config#

from coco_pipe.dim_reduction.config import BaseReducerConfig

All reducer configs inherit from BaseReducerConfig. Common fields:

method — canonical reducer name (Literal, immutable per subclass).
n_components — target dimensionality, default 2.

A second mixin, StochasticReducerConfig, adds a random_state field (default 42) for reducers that have a seed.

—

3. Reducer Configs#

The full set of typed configs maps 1:1 with the registry in coco_pipe.dim_reduction.config.METHODS.

Family	Config class	Key fields
Linear	`PCAConfig`	`whiten`, `svd_solver`
	`IncrementalPCAConfig`	`batch_size`, `whiten`
	`DaskPCAConfig`	`svd_solver`
	`DaskTruncatedSVDConfig`	`algorithm`
Manifold	`IsomapConfig`	`n_neighbors`, `metric`, `p`
	`LLEConfig`	`n_neighbors`, `lle_method`
	`MDSConfig`	`metric`, `n_init`, `dissimilarity`
	`SpectralEmbeddingConfig`	`affinity`, `gamma`
Neighbor	`TSNEConfig`	`perplexity`, `early_exaggeration`, `learning_rate`, `max_iter`, `init`
	`UMAPConfig`	`n_neighbors`, `min_dist`, `metric`, `spread`
	`ParametricUMAPConfig`	`n_neighbors`, `min_dist`, `batch_size`
	`PacmapConfig`	`n_neighbors`, `MN_ratio`, `FP_ratio`, `nn_backend`, `init`
	`TrimapConfig`	`n_inliers`, `n_outliers`, `n_random`
	`PHATEConfig`	`knn`, `decay`, `t`
Spatiotemporal	`DMDConfig`	`tlsq_rank`, `exact`, `opt`
	`TRCAConfig`	`sfreq`, `filterbank`
Neural / Topology	`IVISConfig`	`k`, `model`, `n_epochs_without_progress`
	`TopologicalAEConfig`	`hidden_dims`, `lam`, `lr`, `batch_size`, `epochs`, `device`

See Reducer Catalog for what each reducer does and when to use it.

3.1 Example: full UMAP config#

from coco_pipe.dim_reduction.config import UMAPConfig

config = UMAPConfig(
    n_components=2,
    n_neighbors=15,
    min_dist=0.1,
    metric="euclidean",
    spread=1.0,
    random_state=42,
)

3.2 Example: LLE name renaming#

The pydantic method field is reserved for reducer selection, so LLE’s sklearn parameter method is exposed as lle_method and renamed back via LLEConfig.to_reducer_kwargs():

from coco_pipe.dim_reduction.config import LLEConfig

config = LLEConfig(n_components=2, n_neighbors=10, lle_method="hessian")
# to_reducer_kwargs() yields {"n_components": 2, "n_neighbors": 10, "method": "hessian"}

—

4. Evaluation Config#

from coco_pipe.dim_reduction.config import EvaluationConfig

eval_config = EvaluationConfig(
    metrics=["trustworthiness", "continuity", "lcmc"],
    k_range=[5, 10, 20, 50, 100],
    selection_metric="trustworthiness",
    selection_k=10,
    tie_breakers=["continuity"],
    separation_method="centroid",
)

Fields:

`metrics`	Metric families to compute. Must be canonical evaluator names (see Evaluation and Interpretation). No duplicates; at least one entry.
`k_range`	Neighborhood sizes for multi-scale metrics (`trustworthiness`, `continuity`, `lcmc`, `mrre_*`). Positive integers, no duplicates.
`selection_metric`	Primary ranking metric. Must be in `_VALID_RANKING_METRICS` and in `metrics`.
`selection_k`	Neighborhood size used when ranking a `k`-scoped metric.
`tie_breakers`	Ordered list of additional ranking metrics. Each must also be present in `metrics`.
`separation_method`	Separation definition for trajectory separation: `"centroid"` (default), `"within_between_ratio"`, `"mahalanobis"`, `"distributional"`, `"margin"`.

Early validation pays off

EvaluationConfig rejects unknown metric names, duplicate entries, invalid separation methods, and ranking metrics that are not in metrics. You won’t run a 10-minute scoring loop only to find the ranker has nothing to rank with.

—

5. Configs from YAML / JSON#

All configs are standard pydantic models, so loading from a serialized form is direct:

import yaml
from coco_pipe.dim_reduction.config import UMAPConfig

with open("umap.yaml") as f:
    data = yaml.safe_load(f)
config = UMAPConfig(**data)             # validation runs here

reducer = DimReduction(config)