coco_pipe.dim_reduction.BaseReducer#

class coco_pipe.dim_reduction.BaseReducer(n_components=2, **kwargs)#

Bases: ABC

Abstract base class for all dimensionality reduction implementations.

This class defines the standard interface that all reducers must implement and is safe to subclass for custom reducers. It provides built-in support for model persistence (save/load) using joblib.

For custom reducers operating on nonstandard data layouts, override capabilities so the manager layer can route validation, scoring, plotting, and reporting correctly.

Parameters:
  • n_components (int, default=2) – Target dimensionality of the reduced representation.

  • **kwargs (dict) – Additional keyword arguments stored on params and typically forwarded to the wrapped estimator or backend implementation.

Variables:
  • n_components (int) – Target dimensionality of the reduced representation.

  • params (dict) – Additional reducer parameters captured at initialization time.

  • model (Any) – Underlying fitted model object, such as a scikit-learn estimator or a scientific computing backend. This attribute should be populated by fit.

Notes

The capabilities property returns a plain dictionary consumed by the manager and evaluation layers. Custom reducers should declare supported diagnostics and scalar metadata explicitly through this mapping. Common keys include:

  • input_ndim : expected dimensionality of the input container

  • input_layout : semantic layout name such as “standard”

  • has_transform : whether transform is supported

  • has_inverse_transform : whether inverse transforms are available

  • has_components : whether PCA-like components are exposed

  • supported_diagnostics : names returned by get_diagnostics

  • has_native_plot : whether the reducer exposes its own plotting path

  • is_linear : whether the reducer is linear

  • is_stochastic : whether repeated runs can vary without a fixed seed

  • nested_components : whether the first k components of an n-component fit (k < n) equal a standalone k-component fit, so a sweep can be synthesised by slicing a single max-n fit (true for PCA/SVD, not ICA)

Examples

>>> from sklearn.decomposition import PCA
>>> from coco_pipe.dim_reduction import BaseReducer
>>>
>>> class CustomPCAReducer(BaseReducer):
...     @property
...     def capabilities(self):
...         return self._merge_capabilities(
...             super().capabilities,
...             is_linear=True,
...             has_components=True,
...             supported_diagnostics=("explained_variance_ratio_",),
...         )
...
...     def fit(self, X, y=None):
...         self.model = PCA(n_components=self.n_components, **self.params)
...         self.model.fit(X)
...         return self
...
...     def transform(self, X):
...         return self.model.transform(X)
property name: str#

Return a stable public display name for the reducer.

Return type:

str

abstractmethod fit(X, y=None)#

Fit the model to the data.

Parameters:
  • X (ArrayLike) – Training data. Most reducers expect (n_samples, n_features), but reducers with custom capabilities[“input_layout”] may accept other layouts such as snapshot matrices or grouped trajectory tensors.

  • y (ArrayLike, optional) – Optional supervision aligned with the sample axis used by the reducer’s declared input layout.

Returns:

self – The fitted reducer instance.

Return type:

BaseReducer

Notes

Most reducers expect X to have shape (n_samples, n_features). Some reducers operate on alternative layouts and should document those layouts through capabilities.

abstractmethod transform(X)#

Apply dimensionality reduction to X.

Parameters:

X (ArrayLike) – New data to transform. Its layout should match the reducer’s declared capabilities.

Returns:

X_new – Reduced representation. The exact output shape depends on the reducer, but the last dimension usually matches ~coco_pipe.dim_reduction.reducers.base.BaseReducer.n_components.

Return type:

np.ndarray

Raises:

RuntimeError – Raised by concrete implementations when transform is called before fitting or when the reducer does not support out-of-sample transforms.

fit_transform(X, y=None)#

Fit the model to data and return the transformed data.

This method usually calls fit and then transform, but reducers may override it for efficiency if the underlying algorithm supports a native combined path.

Parameters:
  • X (ArrayLike) – Training data following the reducer’s declared layout.

  • y (ArrayLike, optional) – Optional supervision aligned with the reducer’s input layout.

Returns:

X_new – Reduced representation returned by transform.

Return type:

np.ndarray

save(filepath)#

Persist the reducer to a file.

The default implementation serializes the reducer instance with joblib. Custom reducers should either remain joblib-serializable or override this method and load() with a custom persistence strategy.

Parameters:

filepath (str or Path) – Path to the output file.

Return type:

None

Notes

The default implementation serializes the reducer instance with joblib.dump. Custom reducers should either remain joblib-serializable or override this method and load with a custom persistence strategy.

property capabilities: dict[str, Any]#

Return reducer capability flags consumed by the manager layer.

Custom reducers with nonstandard inputs should override at least input_ndim and input_layout. Reducers exposing diagnostics or scalar quality metadata should declare them explicitly through supported_diagnostics and supported_metadata.

Returns:

Mapping of reducer capability flags.

Return type:

dict

Notes

The default capabilities describe a typical estimator consuming (samples, features) input and exposing transform.

get_diagnostics()#

Return diagnostic arrays or structured artifacts.

Diagnostics are intended for non-scalar outputs such as explained variance curves, eigenvalues, modes, graphs, or training histories. Only names declared in capabilities[“supported_diagnostics”] are queried.

Returns:

diagnostics – Dictionary of diagnostic attributes declared in capabilities[“supported_diagnostics”].

Return type:

dict

Raises:

RuntimeError – If the reducer has not been fitted.

get_quality_metadata()#

Return scalar metadata about the reduction process or quality.

Typical examples include iteration counts, optimization stress, final loss values, or backend-specific convergence flags. Only names declared in capabilities[“supported_metadata”] are queried.

Returns:

metadata – Dictionary containing only scalar values corresponding to keys declared in capabilities[“supported_metadata”].

Return type:

dict

Raises:

RuntimeError – If the reducer has not been fitted.

get_components()#

Return reducer-defined component-like outputs.

Returns:

Reducer-defined component array.

Return type:

np.ndarray

Raises:

ValueError – If the reducer does not expose public components.

classmethod load(filepath)#

Load a reducer from a file.

Parameters:

filepath (str or Path) – Path to the file to load.

Returns:

reducer – The loaded reducer instance.

Return type:

BaseReducer

Notes

This method assumes the reducer was serialized with save or a compatible joblib.dump call.