coco_pipe.dim_reduction.reducers.linear#
Linear dimensionality reduction reducers.
This module provides linear projection wrappers built on top of scikit-learn and optional Dask backends. These reducers follow the shared ~coco_pipe.dim_reduction.reducers.base.BaseReducer contract so they can be used directly with ~coco_pipe.dim_reduction.DimReduction, reporting, and visualization utilities.
Classes#
- PCAReducer
Principal Component Analysis wrapper based on sklearn.decomposition.PCA.
- IncrementalPCAReducer
Incremental PCA wrapper for batch-wise fitting on larger datasets.
- DaskPCAReducer
Optional Dask-ML PCA wrapper for lazy or distributed arrays.
- DaskTruncatedSVDReducer
Optional Dask-ML Truncated SVD wrapper for lazy or distributed arrays.
References
- [1] Pearson, K. (1901). “On Lines and Planes of Closest Fit to Systems of
Points in Space”. Philosophical Magazine, 2(11), 559-572.
- [2] Hotelling, H. (1933). “Analysis of a complex of statistical variables
into principal components”. Journal of Educational Psychology, 24(6), 417-441.
- [3] Scikit-learn PCA documentation:
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca)
Classes#
Principal Component Analysis reducer. |
|
Incremental PCA reducer. |
|
Dask-ML PCA reducer for lazy or distributed data. |
|
Dask-ML Truncated SVD reducer. |
Module Contents#
- class coco_pipe.dim_reduction.reducers.linear.PCAReducer(n_components=2, **kwargs)#
Bases:
coco_pipe.dim_reduction.reducers.base.BaseReducerPrincipal Component Analysis reducer.
This reducer wraps sklearn.decomposition.PCA and provides a linear low-dimensional embedding based on singular value decomposition.
- Parameters:
- Variables:
model (sklearn.decomposition.PCA or None) – Fitted PCA estimator after fit.
Notes
This is a deterministic linear reducer unless a randomized solver is used.
See also
IncrementalPCAReducerLinear PCA variant for batch-wise fitting.
DaskPCAReducerLinear PCA variant for lazy or distributed arrays.
DaskTruncatedSVDReducerLinear factorization alternative for lazy arrays.
IsomapReducerNonlinear manifold learner based on geodesic distances.
TSNEReducerNonlinear neighborhood-preserving embedding.
UMAPReducerNonlinear graph-based embedding balancing local and global structure.
PHATEReducerNonlinear diffusion-based embedding for smooth trajectories.
Examples
>>> import numpy as np >>> from coco_pipe.dim_reduction import PCAReducer >>> X = np.random.rand(100, 10) >>> reducer = PCAReducer(n_components=2, random_state=42) >>> _ = reducer.fit(X) >>> X_reduced = reducer.transform(X) >>> X_reduced.shape (100, 2) >>> reducer.explained_variance_ratio_.shape (2,) >>> reducer.components_.shape (2, 10) >>> reducer = PCAReducer(n_components=3, whiten=True) >>> reducer.fit_transform(X).shape (100, 3)
- property capabilities: dict#
Return capability metadata for PCA.
- Returns:
Capability mapping describing PCA as a linear component-based reducer.
- Return type:
- fit(X, y=None)#
Fit PCA on the input data.
- Parameters:
X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.
- Returns:
Fitted reducer instance.
- Return type:
Examples
>>> import numpy as np >>> from coco_pipe.dim_reduction import PCAReducer >>> X = np.random.rand(20, 5) >>> reducer = PCAReducer(n_components=2) >>> _ = reducer.fit(X) >>> reducer.model is not None True
- transform(X)#
Project data onto the fitted principal component basis.
- Parameters:
X (ArrayLike of shape (n_samples, n_features)) – Data to project.
- Returns:
Projected coordinates in principal component space.
- Return type:
np.ndarray of shape (n_samples, n_dims)
- Raises:
RuntimeError – If the reducer has not been fitted.
- property explained_variance_ratio_: numpy.ndarray#
Percentage of variance explained by each selected component.
- Returns:
Explained variance ratio for each retained component.
- Return type:
np.ndarray of shape (n_dims,)
- Raises:
RuntimeError – If the reducer has not been fitted.
- property participation_ratio_: float#
Effective dimensionality computed as the Participation Ratio.
- Returns:
Participation ratio of the retained components.
- Return type:
- Raises:
RuntimeError – If the reducer has not been fitted.
- property components_: numpy.ndarray#
Principal axes in feature space.
- Returns:
Principal component loading matrix.
- Return type:
np.ndarray of shape (n_dims, n_features)
- Raises:
RuntimeError – If the reducer has not been fitted.
- get_components()#
Return the principal component loading matrix.
- Returns:
Principal component loading matrix.
- Return type:
np.ndarray
- Raises:
RuntimeError – If the reducer has not been fitted.
- class coco_pipe.dim_reduction.reducers.linear.IncrementalPCAReducer(n_components=2, batch_size=None, **kwargs)#
Bases:
coco_pipe.dim_reduction.reducers.base.BaseReducerIncremental PCA reducer.
This reducer wraps sklearn.decomposition.IncrementalPCA for batch-wise fitting when the full dataset is too large to process in one pass.
- Parameters:
- Variables:
batch_size (int or None) – Batch size used when fitting the incremental estimator.
model (sklearn.decomposition.IncrementalPCA or None) – Fitted IncrementalPCA estimator after fit or partial_fit.
See also
PCAReducerStandard in-memory linear PCA reducer.
DaskPCAReducerLinear PCA variant for lazy or distributed arrays.
DaskTruncatedSVDReducerLinear factorization alternative for lazy arrays.
IsomapReducerNonlinear manifold learner based on geodesic distances.
TSNEReducerNonlinear neighborhood-preserving embedding.
UMAPReducerNonlinear graph-based embedding balancing local and global structure.
Examples
>>> import numpy as np >>> from coco_pipe.dim_reduction import IncrementalPCAReducer >>> X = np.random.rand(100, 12) >>> reducer = IncrementalPCAReducer(n_components=3, batch_size=25) >>> _ = reducer.fit(X) >>> reducer.transform(X[:10]).shape (10, 3) >>> stream = IncrementalPCAReducer(n_components=2, batch_size=20) >>> _ = stream.partial_fit(X[:50]) >>> _ = stream.partial_fit(X[50:]) >>> stream.transform(X).shape (100, 2)
- property capabilities: dict#
Return capability metadata for Incremental PCA.
- Returns:
Capability mapping describing Incremental PCA as a linear component-based reducer.
- Return type:
- batch_size = None#
- fit(X, y=None)#
Fit Incremental PCA in batch mode.
- Parameters:
X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.
- Returns:
Fitted reducer instance.
- Return type:
Examples
>>> import numpy as np >>> from coco_pipe.dim_reduction import IncrementalPCAReducer >>> X = np.random.rand(30, 6) >>> reducer = IncrementalPCAReducer(n_components=2, batch_size=10) >>> _ = reducer.fit(X) >>> reducer.model is not None True
- partial_fit(X, y=None)#
Incrementally fit the estimator on a batch of samples.
- Parameters:
X (ArrayLike of shape (n_samples, n_features)) – Batch of training samples.
y (ArrayLike, optional) – Ignored. Present for API compatibility.
- Returns:
Reducer instance after updating the incremental estimator.
- Return type:
Examples
>>> import numpy as np >>> from coco_pipe.dim_reduction import IncrementalPCAReducer >>> X = np.random.rand(40, 6) >>> reducer = IncrementalPCAReducer(n_components=2, batch_size=20) >>> _ = reducer.partial_fit(X[:20]) >>> _ = reducer.partial_fit(X[20:]) >>> reducer.model is not None True
- transform(X)#
Project data onto the fitted incremental PCA basis.
- Parameters:
X (ArrayLike of shape (n_samples, n_features)) – Data to project.
- Returns:
Projected coordinates in component space.
- Return type:
np.ndarray of shape (n_samples, n_dims)
- Raises:
RuntimeError – If the reducer has not been fitted.
- property explained_variance_ratio_: numpy.ndarray#
Percentage of variance explained by each selected component.
- Returns:
Explained variance ratio for each retained component.
- Return type:
np.ndarray of shape (n_dims,)
- Raises:
RuntimeError – If the reducer has not been fitted.
- property participation_ratio_: float#
Effective dimensionality computed as the Participation Ratio.
- Returns:
Participation ratio of the retained components.
- Return type:
- Raises:
RuntimeError – If the reducer has not been fitted.
- property components_: numpy.ndarray#
Principal axes in feature space.
- Returns:
Principal component loading matrix.
- Return type:
np.ndarray of shape (n_dims, n_features)
- Raises:
RuntimeError – If the reducer has not been fitted.
- get_components()#
Return the incremental PCA component loading matrix.
- Returns:
Principal component loading matrix.
- Return type:
np.ndarray
- Raises:
RuntimeError – If the reducer has not been fitted.
- class coco_pipe.dim_reduction.reducers.linear.DaskPCAReducer(n_components=2, svd_solver='auto', **kwargs)#
Bases:
coco_pipe.dim_reduction.reducers.base.BaseReducerDask-ML PCA reducer for lazy or distributed data.
This reducer wraps dask_ml.decomposition.PCA. The backend is imported lazily so the rest of the package remains importable without dask-ml.
- Parameters:
- Variables:
svd_solver (str) – Solver used when instantiating the Dask PCA estimator.
model (dask_ml.decomposition.PCA or None) – Fitted Dask PCA estimator after fit.
Notes
This reducer requires the optional dask-ml backend.
See also
PCAReducerStandard in-memory linear PCA reducer.
IncrementalPCAReducerLinear PCA variant for batch-wise fitting.
DaskTruncatedSVDReducerLinear SVD-based alternative for lazy arrays.
IsomapReducerNonlinear manifold learner based on geodesic distances.
TSNEReducerNonlinear neighborhood-preserving embedding.
UMAPReducerNonlinear graph-based embedding balancing local and global structure.
Examples
>>> import dask.array as da >>> import numpy as np >>> from coco_pipe.dim_reduction import DaskPCAReducer >>> X = da.from_array(np.random.rand(100, 10), chunks=(25, 10)) >>> reducer = DaskPCAReducer(n_components=2, svd_solver="tsqr") >>> _ = reducer.fit(X) >>> reducer.transform(X).shape (100, 2)
- property capabilities: dict#
Return capability metadata for Dask PCA.
- Returns:
Capability mapping describing Dask PCA as a linear component-based reducer.
- Return type:
- svd_solver = 'auto'#
- fit(X, y=None)#
Fit Dask PCA on the input data.
- Parameters:
- Returns:
Fitted reducer instance.
- Return type:
- Raises:
ImportError – If dask-ml is not installed.
RuntimeError – If dask-ml is installed but fails during initialization.
Examples
>>> import dask.array as da >>> import numpy as np >>> from coco_pipe.dim_reduction import DaskPCAReducer >>> X = da.from_array(np.random.rand(40, 8), chunks=(20, 8)) >>> reducer = DaskPCAReducer(n_components=2) >>> _ = reducer.fit(X) >>> reducer.model is not None True
- transform(X)#
Project data using the fitted Dask PCA model.
- Parameters:
X (ArrayLike) – Data to project.
- Returns:
Backend-specific transformed output, typically a Dask array.
- Return type:
Any
- Raises:
RuntimeError – If the reducer has not been fitted.
- property explained_variance_ratio_: numpy.ndarray#
Percentage of variance explained by each selected component.
- Returns:
Explained variance ratio for each retained component.
- Return type:
np.ndarray of shape (n_dims,)
- Raises:
RuntimeError – If the reducer has not been fitted.
- property participation_ratio_: float#
Effective dimensionality computed as the Participation Ratio.
- Returns:
Participation ratio of the retained components.
- Return type:
- Raises:
RuntimeError – If the reducer has not been fitted.
- property components_: numpy.ndarray#
Principal axes in feature space.
- Returns:
Principal component loading matrix.
- Return type:
np.ndarray of shape (n_dims, n_features)
- Raises:
RuntimeError – If the reducer has not been fitted.
- get_components()#
Return the Dask PCA component loading matrix.
- Returns:
Principal component loading matrix or Dask-backed equivalent.
- Return type:
np.ndarray
- Raises:
RuntimeError – If the reducer has not been fitted.
- class coco_pipe.dim_reduction.reducers.linear.DaskTruncatedSVDReducer(n_components=2, algorithm='tsqr', **kwargs)#
Bases:
coco_pipe.dim_reduction.reducers.base.BaseReducerDask-ML Truncated SVD reducer.
This reducer wraps dask_ml.decomposition.TruncatedSVD and provides a linear projection for lazy or distributed arrays.
- Parameters:
- Variables:
algorithm (str) – SVD algorithm used when instantiating the backend estimator.
model (dask_ml.decomposition.TruncatedSVD or None) – Fitted TruncatedSVD estimator after fit.
Notes
This reducer requires the optional dask-ml backend.
See also
PCAReducerStandard in-memory linear PCA reducer.
IncrementalPCAReducerLinear PCA variant for batch-wise fitting.
DaskPCAReducerLinear PCA variant for lazy or distributed arrays.
IsomapReducerNonlinear manifold learner based on geodesic distances.
TSNEReducerNonlinear neighborhood-preserving embedding.
UMAPReducerNonlinear graph-based embedding balancing local and global structure.
Examples
>>> import dask.array as da >>> import numpy as np >>> from coco_pipe.dim_reduction import DaskTruncatedSVDReducer >>> X = da.from_array(np.random.rand(120, 15), chunks=(30, 15)) >>> reducer = DaskTruncatedSVDReducer(n_components=3, algorithm="randomized") >>> _ = reducer.fit(X) >>> reducer.transform(X).shape (120, 3)
- property capabilities: dict#
Return capability metadata for Dask Truncated SVD.
- Returns:
Capability mapping describing Dask Truncated SVD as a linear component-based reducer.
- Return type:
- algorithm = 'tsqr'#
- fit(X, y=None)#
Fit Dask Truncated SVD on the input data.
- Parameters:
- Returns:
Fitted reducer instance.
- Return type:
- Raises:
ImportError – If dask-ml is not installed.
RuntimeError – If dask-ml is installed but fails during initialization.
Examples
>>> import dask.array as da >>> import numpy as np >>> from coco_pipe.dim_reduction import DaskTruncatedSVDReducer >>> X = da.from_array(np.random.rand(40, 8), chunks=(20, 8)) >>> reducer = DaskTruncatedSVDReducer(n_components=2) >>> _ = reducer.fit(X) >>> reducer.model is not None True
- transform(X)#
Project data using the fitted Dask Truncated SVD model.
- Parameters:
X (ArrayLike) – Data to project.
- Returns:
Backend-specific transformed output, typically a Dask array.
- Return type:
Any
- Raises:
RuntimeError – If the reducer has not been fitted.
- property explained_variance_ratio_: numpy.ndarray#
Percentage of variance explained by each selected component.
- Returns:
Explained variance ratio for each retained component.
- Return type:
np.ndarray of shape (n_dims,)
- Raises:
RuntimeError – If the reducer has not been fitted.
- property participation_ratio_: float#
Effective dimensionality computed as the Participation Ratio.
- Returns:
Participation ratio of the retained components.
- Return type:
- Raises:
RuntimeError – If the reducer has not been fitted.
- property components_: numpy.ndarray#
Principal axes in feature space.
- Returns:
Principal component loading matrix.
- Return type:
np.ndarray of shape (n_dims, n_features)
- Raises:
RuntimeError – If the reducer has not been fitted.
- get_components()#
Return the Truncated SVD component loading matrix.
- Returns:
Component loading matrix or Dask-backed equivalent.
- Return type:
np.ndarray
- Raises:
RuntimeError – If the reducer has not been fitted.