Core Workflow and Configuration#
DimReduction Manager#
DimReduction is the single public manager
class. It wraps one reducer, drives fit / transform, delegates scoring
to the pure evaluator, runs interpretation analyses, and exposes a consolidated
get_summary() payload.
The manager does not cache embeddings. Every method that needs an embedding takes it as an explicit argument.
—
1. Construction#
from coco_pipe.dim_reduction import DimReduction
from coco_pipe.dim_reduction.config import UMAPConfig
# By method name
reducer = DimReduction("UMAP", n_components=2, n_neighbors=15)
# With a typed config
reducer = DimReduction(UMAPConfig(n_components=2, n_neighbors=15))
Resolution order for reducer kwargs (later overrides earlier):
Fields from the config (if a
BaseReducerConfigis passed).The
params=dictionary.Additional
**kwargs.
—
2. Lifecycle Methods#
reducer.fit(X) # fit only
embedding = reducer.transform(X) # transform an already-fitted reducer
embedding = reducer.fit_transform(X) # combined
All three return self / a NumPy array as appropriate. They reset cached
metrics_, diagnostics_, metric_records_, and interpretation_
so stale evaluations cannot leak across runs.
DimReduction.get_components() returns reducer components for linear
methods (PCA, IncrementalPCA, DaskPCA, DaskTruncatedSVD). It raises for
methods that do not expose components.
—
3. Scoring#
scores = reducer.score(
embedding, # required, explicit
X=X, # required for 2D metrics
n_neighbors=5, # single-score neighborhood
metrics=["trustworthiness", "continuity"],
k_values=[5, 10, 20], # multi-scale sweep
labels=labels, # used for trajectory_separation
groups=groups, # for supervised separation metrics
times=times, # for trajectory AUC
separation_method="centroid",
)
Returns a dict with three keys:
|
Scalar metric summary (also cached on |
|
Scalar descriptive metadata (also cached on |
|
Array / structured diagnostics (e.g., |
Tidy long-form records are cached on metric_records_ for downstream
ranking and reporting.
2D vs. 3D shape routing
The evaluator chooses standard or trajectory metrics from embedding.shape,
not from the reducer name. Pass a (n_samples, n_components) embedding
for standard metrics; pass a (n_trajectories, n_times, n_dims) tensor
for trajectory metrics. See Trajectory Analysis.
—
4. Interpretation#
result = reducer.interpret(
X,
X_emb=embedding,
analyses=["correlation", "perturbation", "gradient"],
feature_names=feature_names,
n_repeats=5,
random_state=42,
)
Returns {"analysis": ..., "records": [...]}. The analysis payload is
keyed by analysis name; records is tidy long-form ready for plotting and
reports. Both are cached on interpretation_ and interpretation_records_.
Supported analyses:
"correlation"— Spearman correlations between input features and embedding axes. Works for any reducer."perturbation"— model-agnostic feature importance from per-feature shuffling. Requires a fitted reducer withtransform."gradient"— encoder saliency for supported torch-based reducers (IVIS,ParametricUMAP,TopologicalAE).
See Feature Interpretation for the math and reducer requirements.
—
5. Inspecting Cached State#
reducer.get_metrics() # scalar metrics_
reducer.get_quality_metadata() # metadata from reducer + evaluator
reducer.get_diagnostics() # full diagnostics_ payload
reducer.get_summary() # combined: metrics + metadata + diagnostics
# + metric_records + interpretation + capabilities
get_summary() is the canonical input for coco_pipe.report.Report.add_reduction()
and is JSON-serializable. It deliberately does not carry an embedding —
pass embeddings explicitly to plotting and reporting paths that need them.
—
6. Capabilities#
caps = reducer.capabilities
# {'is_linear': True, 'has_components': True, 'has_loss_history': False, ...}
The manager exposes the reducer’s capability dict directly. Capabilities are used by:
the evaluator to skip metrics the reducer cannot support,
coco_pipe.viz.dim_reduction.plot_loss_history()to detect available loss curves,MethodSelectorfor capability-aware filtering in comparison tables.
—
7. Persistence#
reducer.save("models/umap.pkl")
loaded = DimReduction.load("models/umap.pkl", method="UMAP")
save writes the fitted reducer with its kwargs and method name; load
re-instantiates the manager and restores the reducer’s fitted state. Cached
evaluation payloads are not persisted — re-run score() on the
embedding you have.
—
8. End-to-End Skeleton#
from coco_pipe.dim_reduction import DimReduction
from coco_pipe.viz import plot_embedding, plot_metrics
reducer = DimReduction("UMAP", n_components=2, n_neighbors=15, random_state=42)
embedding = reducer.fit_transform(X)
reducer.score(embedding, X=X, k_values=[5, 10, 20])
reducer.interpret(X, X_emb=embedding, analyses=["correlation"],
feature_names=feature_names)
summary = reducer.get_summary()
plot_embedding(embedding, labels=class_ids)
plot_metrics(reducer) # accepts the manager directly
For a multi-reducer comparison, see
MethodSelector in
Evaluation and Interpretation.
Configuration Reference#
Every reducer in coco_pipe.dim_reduction accepts either keyword arguments
to DimReduction or a typed pydantic config.
Configs validate field names, types, and ranges at parse time, so typos and
incompatible options fail before any data is touched.
—
1. The Two Equivalent Construction Styles#
from coco_pipe.dim_reduction import DimReduction
from coco_pipe.dim_reduction.config import UMAPConfig
# Keyword-style (string method name + kwargs)
reducer = DimReduction("UMAP", n_components=2, n_neighbors=15, min_dist=0.1)
# Config-style (typed pydantic model)
reducer = DimReduction(UMAPConfig(n_components=2, n_neighbors=15, min_dist=0.1))
Prefer the config style when:
Reducer parameters come from a YAML or JSON file (pass via
**config).You want a single object that can be serialized, logged, or reused across experiments.
You want strict validation of every field at construction time.
Use the keyword style for one-off scripts and exploration.
—
2. Base Config#
from coco_pipe.dim_reduction.config import BaseReducerConfig
All reducer configs inherit from BaseReducerConfig. Common fields:
method— canonical reducer name (Literal, immutable per subclass).n_components— target dimensionality, default 2.
A second mixin, StochasticReducerConfig, adds a random_state
field (default 42) for reducers that have a seed.
—
3. Reducer Configs#
The full set of typed configs maps 1:1 with the registry in
coco_pipe.dim_reduction.config.METHODS.
Family |
Config class |
Key fields |
|---|---|---|
Linear |
|
|
|
|
|
|
|
|
|
|
|
Manifold |
|
|
|
|
|
|
|
|
|
|
|
Neighbor |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Spatiotemporal |
|
|
|
|
|
Neural / Topology |
|
|
|
|
See Reducer Catalog for what each reducer does and when to use it.
3.1 Example: full UMAP config#
from coco_pipe.dim_reduction.config import UMAPConfig
config = UMAPConfig(
n_components=2,
n_neighbors=15,
min_dist=0.1,
metric="euclidean",
spread=1.0,
random_state=42,
)
3.2 Example: LLE name renaming#
The pydantic method field is reserved for reducer selection, so LLE’s
sklearn parameter method is exposed as lle_method and renamed back
via LLEConfig.to_reducer_kwargs():
from coco_pipe.dim_reduction.config import LLEConfig
config = LLEConfig(n_components=2, n_neighbors=10, lle_method="hessian")
# to_reducer_kwargs() yields {"n_components": 2, "n_neighbors": 10, "method": "hessian"}
—
4. Evaluation Config#
from coco_pipe.dim_reduction.config import EvaluationConfig
eval_config = EvaluationConfig(
metrics=["trustworthiness", "continuity", "lcmc"],
k_range=[5, 10, 20, 50, 100],
selection_metric="trustworthiness",
selection_k=10,
tie_breakers=["continuity"],
separation_method="centroid",
)
Fields:
|
Metric families to compute. Must be canonical evaluator names (see Evaluation and Interpretation). No duplicates; at least one entry. |
|
Neighborhood sizes for multi-scale metrics
( |
|
Primary ranking metric. Must be in
|
|
Neighborhood size used when ranking a
|
|
Ordered list of additional ranking metrics.
Each must also be present in |
|
Separation definition for trajectory
separation: |
Early validation pays off
EvaluationConfig rejects unknown metric names, duplicate entries,
invalid separation methods, and ranking metrics that are not in metrics.
You won’t run a 10-minute scoring loop only to find the ranker has nothing
to rank with.
—
5. Configs from YAML / JSON#
All configs are standard pydantic models, so loading from a serialized form is direct:
import yaml
from coco_pipe.dim_reduction.config import UMAPConfig
with open("umap.yaml") as f:
data = yaml.safe_load(f)
config = UMAPConfig(**data) # validation runs here
reducer = DimReduction(config)