coco_pipe.dim_reduction.BaseReducer#
- class coco_pipe.dim_reduction.BaseReducer(n_components=2, **kwargs)#
Bases:
ABCAbstract base class for all dimensionality reduction implementations.
This class defines the standard interface that all reducers must implement and is safe to subclass for custom reducers. It provides built-in support for model persistence (save/load) using joblib.
For custom reducers operating on nonstandard data layouts, override capabilities so the manager layer can route validation, scoring, plotting, and reporting correctly.
- Parameters:
- Variables:
n_components (int) – Target dimensionality of the reduced representation.
params (dict) – Additional reducer parameters captured at initialization time.
model (Any) – Underlying fitted model object, such as a scikit-learn estimator or a scientific computing backend. This attribute should be populated by fit.
Notes
The capabilities property returns a plain dictionary consumed by the manager and evaluation layers. Custom reducers should declare supported diagnostics and scalar metadata explicitly through this mapping. Common keys include:
input_ndim : expected dimensionality of the input container
input_layout : semantic layout name such as “standard”
has_transform : whether transform is supported
has_inverse_transform : whether inverse transforms are available
has_components : whether PCA-like components are exposed
supported_diagnostics : names returned by get_diagnostics
has_native_plot : whether the reducer exposes its own plotting path
is_linear : whether the reducer is linear
is_stochastic : whether repeated runs can vary without a fixed seed
nested_components : whether the first
kcomponents of ann-component fit (k < n) equal a standalonek-component fit, so a sweep can be synthesised by slicing a single max-nfit (true for PCA/SVD, not ICA)
Examples
>>> from sklearn.decomposition import PCA >>> from coco_pipe.dim_reduction import BaseReducer >>> >>> class CustomPCAReducer(BaseReducer): ... @property ... def capabilities(self): ... return self._merge_capabilities( ... super().capabilities, ... is_linear=True, ... has_components=True, ... supported_diagnostics=("explained_variance_ratio_",), ... ) ... ... def fit(self, X, y=None): ... self.model = PCA(n_components=self.n_components, **self.params) ... self.model.fit(X) ... return self ... ... def transform(self, X): ... return self.model.transform(X)
- abstractmethod fit(X, y=None)#
Fit the model to the data.
- Parameters:
X (ArrayLike) – Training data. Most reducers expect (n_samples, n_features), but reducers with custom capabilities[“input_layout”] may accept other layouts such as snapshot matrices or grouped trajectory tensors.
y (ArrayLike, optional) – Optional supervision aligned with the sample axis used by the reducer’s declared input layout.
- Returns:
self – The fitted reducer instance.
- Return type:
Notes
Most reducers expect X to have shape (n_samples, n_features). Some reducers operate on alternative layouts and should document those layouts through capabilities.
- abstractmethod transform(X)#
Apply dimensionality reduction to X.
- Parameters:
X (ArrayLike) – New data to transform. Its layout should match the reducer’s declared capabilities.
- Returns:
X_new – Reduced representation. The exact output shape depends on the reducer, but the last dimension usually matches ~coco_pipe.dim_reduction.reducers.base.BaseReducer.n_components.
- Return type:
np.ndarray
- Raises:
RuntimeError – Raised by concrete implementations when transform is called before fitting or when the reducer does not support out-of-sample transforms.
- fit_transform(X, y=None)#
Fit the model to data and return the transformed data.
This method usually calls fit and then transform, but reducers may override it for efficiency if the underlying algorithm supports a native combined path.
- save(filepath)#
Persist the reducer to a file.
The default implementation serializes the reducer instance with joblib. Custom reducers should either remain joblib-serializable or override this method and load() with a custom persistence strategy.
- Parameters:
filepath (str or Path) – Path to the output file.
- Return type:
None
Notes
The default implementation serializes the reducer instance with joblib.dump. Custom reducers should either remain joblib-serializable or override this method and load with a custom persistence strategy.
- property capabilities: dict[str, Any]#
Return reducer capability flags consumed by the manager layer.
Custom reducers with nonstandard inputs should override at least input_ndim and input_layout. Reducers exposing diagnostics or scalar quality metadata should declare them explicitly through supported_diagnostics and supported_metadata.
- Returns:
Mapping of reducer capability flags.
- Return type:
Notes
The default capabilities describe a typical estimator consuming (samples, features) input and exposing transform.
- get_diagnostics()#
Return diagnostic arrays or structured artifacts.
Diagnostics are intended for non-scalar outputs such as explained variance curves, eigenvalues, modes, graphs, or training histories. Only names declared in capabilities[“supported_diagnostics”] are queried.
- Returns:
diagnostics – Dictionary of diagnostic attributes declared in capabilities[“supported_diagnostics”].
- Return type:
- Raises:
RuntimeError – If the reducer has not been fitted.
- get_quality_metadata()#
Return scalar metadata about the reduction process or quality.
Typical examples include iteration counts, optimization stress, final loss values, or backend-specific convergence flags. Only names declared in capabilities[“supported_metadata”] are queried.
- Returns:
metadata – Dictionary containing only scalar values corresponding to keys declared in capabilities[“supported_metadata”].
- Return type:
- Raises:
RuntimeError – If the reducer has not been fitted.
- get_components()#
Return reducer-defined component-like outputs.
- Returns:
Reducer-defined component array.
- Return type:
np.ndarray
- Raises:
ValueError – If the reducer does not expose public components.