coco_pipe.dim_reduction.reducers.neighbor#

Neighbor-embedding and graph-based reducers.

This module provides wrappers for neighborhood-preserving and graph-based nonlinear dimensionality reduction methods, including t-SNE, UMAP, PaCMAP, TriMap, PHATE, and Parametric UMAP.

Classes#

TSNEReducer

t-Distributed Stochastic Neighbor Embedding wrapper.

UMAPReducer

Uniform Manifold Approximation and Projection wrapper.

PacmapReducer

Pairwise Controlled Manifold Approximation wrapper.

TrimapReducer

Triplet-based manifold embedding wrapper.

PHATEReducer

Diffusion-based PHATE embedding wrapper.

ParametricUMAPReducer

Neural-network-backed Parametric UMAP wrapper.

References

[1] van der Maaten, L., and Hinton, G. (2008). “Visualizing data using

t-SNE”. Journal of Machine Learning Research, 9, 2579-2605.

[2] McInnes, L., Healy, J., and Melville, J. (2018). “UMAP: Uniform

Manifold Approximation and Projection for Dimension Reduction”. arXiv.

[3] Wang, Y., et al. (2021). “PaCMAP: Pairwise Controlled Manifold

Approximation”. Journal of Machine Learning Research, 22(201), 1-47.

[4] Amid, E., and Warmuth, M. K. (2019). “TriMap: Large-scale

Dimensionality Reduction Using Triplets”. arXiv.

[5] Moon, K. R., et al. (2019). “Visualizing structure and transitions in

high-dimensional biological data”. Nature Biotechnology, 37, 1482-1492.

Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca)

Sina Esmaeili (sina.esmaeili@umontreal.ca)

Classes#

TSNEReducer

t-SNE reducer.

UMAPReducer

UMAP reducer.

PacmapReducer

PaCMAP reducer.

TrimapReducer

TriMap reducer.

PHATEReducer

PHATE reducer.

ParametricUMAPReducer

Parametric UMAP reducer.

Module Contents#

class coco_pipe.dim_reduction.reducers.neighbor.TSNEReducer(n_components=2, **kwargs)#

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

t-SNE reducer.

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a neighborhood- preserving method designed primarily for visualization. It optimizes a low-dimensional embedding by matching pairwise similarities between the original space and the embedding.

Parameters:
  • n_components (int, default=2) – Number of embedding dimensions.

  • **kwargs (dict) – Additional keyword arguments forwarded to sklearn.manifold.TSNE after signature filtering. Common options include perplexity, learning_rate, max_iter, init, and random_state.

Variables:
  • embedding (np.ndarray or None) – Learned training-set embedding after fit or fit_transform.

  • model (sklearn.manifold.TSNE or None) – Fitted t-SNE estimator after fit or fit_transform.

Notes

transform is not supported because scikit-learn t-SNE does not provide an out-of-sample projection API.

See also

UMAPReducer

Nonlinear graph-based embedding with transform support.

PacmapReducer

Nonlinear embedding balancing local and global structure.

TrimapReducer

Nonlinear triplet-based embedding preserving global layout.

PHATEReducer

Diffusion-based embedding for continuous trajectories.

PCAReducer

Linear baseline for global variance preservation.

IsomapReducer

Nonlinear geodesic-distance manifold embedding.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import TSNEReducer
>>> X = np.random.rand(100, 10)
>>> reducer = TSNEReducer(n_components=2, perplexity=20, random_state=42)
>>> embedding = reducer.fit_transform(X)
>>> embedding.shape
(100, 2)
>>> reducer.get_quality_metadata()["kl_divergence_"] >= 0
True
>>> _ = reducer.fit(X)
>>> reducer.embedding_.shape
(100, 2)
property capabilities: dict#

Return capability metadata for t-SNE.

Returns:

Capability mapping describing t-SNE as a nonlinear stochastic reducer without out-of-sample transform support.

Return type:

dict

embedding_ = None#
fit(X, y=None)#

Fit t-SNE on the input data.

Parameters:
Returns:

Fitted reducer instance.

Return type:

TSNEReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import TSNEReducer
>>> X = np.random.rand(30, 6)
>>> reducer = TSNEReducer(n_components=2, perplexity=5, max_iter=250)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
abstractmethod transform(X)#

Raise because t-SNE does not support out-of-sample transformation.

Parameters:

X (ArrayLike) – Ignored input included for API compatibility.

Raises:

NotImplementedError – Always raised because t-SNE does not support transforming new data.

Return type:

numpy.ndarray

fit_transform(X, y=None)#

Fit t-SNE and return the embedding coordinates.

Parameters:
Returns:

Embedded coordinates produced by t-SNE.

Return type:

np.ndarray of shape (n_samples, n_dims)

class coco_pipe.dim_reduction.reducers.neighbor.UMAPReducer(n_components=2, **kwargs)#

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

UMAP reducer.

Uniform Manifold Approximation and Projection (UMAP) constructs a graph in the high-dimensional space and optimizes a low-dimensional representation of that graph. Unlike t-SNE, UMAP supports out-of-sample transformation.

Parameters:
  • n_components (int, default=2) – Number of embedding dimensions.

  • **kwargs (dict) – Additional keyword arguments forwarded to umap.UMAP after signature filtering. Common options include n_neighbors, min_dist, metric, and random_state.

Variables:

model (umap.UMAP or None) – Fitted UMAP estimator after fit.

See also

TSNEReducer

Nonlinear neighborhood-preserving visualization method.

PacmapReducer

Nonlinear embedding balancing local and global structure.

TrimapReducer

Nonlinear triplet-based embedding preserving global layout.

PHATEReducer

Diffusion-based embedding for continuous trajectories.

IsomapReducer

Nonlinear geodesic-distance manifold embedding.

PCAReducer

Linear baseline for global variance preservation.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import UMAPReducer
>>> X = np.random.rand(100, 10)
>>> reducer = UMAPReducer(n_components=2, n_neighbors=10, random_state=42)
>>> _ = reducer.fit(X)
>>> reducer.transform(X[:10]).shape
(10, 2)
>>> reducer.get_diagnostics()["graph_"] is not None
True
>>> reducer.fit_transform(X).shape
(100, 2)
property capabilities: dict#

Return capability metadata for UMAP.

Returns:

Capability mapping describing UMAP as a nonlinear stochastic reducer with transform support and a native plotting path.

Return type:

dict

fit(X, y=None)#

Fit UMAP on the input data.

Parameters:
Returns:

Fitted reducer instance.

Return type:

UMAPReducer

Raises:
  • ImportError – If umap-learn is not installed.

  • RuntimeError – If umap-learn is installed but fails during initialization.

transform(X)#

Project data using the fitted UMAP model.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Data to project.

Returns:

Low-dimensional embedding coordinates.

Return type:

np.ndarray of shape (n_samples, n_dims)

Raises:

RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.reducers.neighbor.PacmapReducer(n_components=2, n_neighbors=10, MN_ratio=0.5, FP_ratio=2.0, nn_backend='faiss', init='pca', **kwargs)#

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

PaCMAP reducer.

Pairwise Controlled Manifold Approximation (PaCMAP) preserves local and global structure by balancing near, mid-near, and far pairs during the optimization.

Parameters:
  • n_components (int, default=2) – Number of embedding dimensions.

  • n_neighbors (int, default=10) – Number of neighbors used to form local pairs.

  • MN_ratio (float, default=0.5) – Ratio of mid-near pairs.

  • FP_ratio (float, default=2.0) – Ratio of far pairs.

  • nn_backend ({"faiss", "annoy", "voyager"}, default="faiss") – Nearest-neighbor backend used by recent PaCMAP versions. Older PaCMAP releases that do not expose this argument will ignore it through signature filtering.

  • init (str, default="pca") – Initialization strategy passed to fit_transform.

  • **kwargs (dict) – Additional keyword arguments forwarded to pacmap.PaCMAP after signature filtering.

Variables:
  • embedding (np.ndarray or None) – Learned training-set embedding after fit or fit_transform.

  • model (pacmap.PaCMAP or None) – Fitted PaCMAP estimator after fit or fit_transform.

Notes

transform is not supported because PaCMAP does not provide an efficient out-of-sample projection API.

See also

UMAPReducer

Nonlinear graph-based embedding with transform support.

TrimapReducer

Nonlinear triplet-based embedding preserving global layout.

TSNEReducer

Nonlinear neighborhood-preserving visualization method.

PHATEReducer

Diffusion-based embedding for continuous trajectories.

PCAReducer

Linear baseline for global variance preservation.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import PacmapReducer
>>> X = np.random.rand(100, 10)
>>> reducer = PacmapReducer(
...     n_components=2,
...     n_neighbors=10,
...     nn_backend="faiss",
...     init="random",
... )
>>> embedding = reducer.fit_transform(X)
>>> embedding.shape
(100, 2)
>>> reducer.embedding_.shape
(100, 2)
property capabilities: dict#

Return capability metadata for PaCMAP.

Returns:

Capability mapping describing PaCMAP as a nonlinear stochastic reducer without out-of-sample transform support.

Return type:

dict

n_neighbors = 10#
MN_ratio = 0.5#
FP_ratio = 2.0#
nn_backend = 'faiss'#
init = 'pca'#
embedding_ = None#
fit(X, y=None)#

Fit PaCMAP on the input data.

Parameters:
Returns:

Fitted reducer instance.

Return type:

PacmapReducer

Raises:
  • ImportError – If pacmap is not installed.

  • RuntimeError – If pacmap is installed but fails during initialization.

abstractmethod transform(X)#

Raise because PaCMAP does not support out-of-sample transformation.

Parameters:

X (ArrayLike) – Ignored input included for API compatibility.

Raises:

NotImplementedError – Always raised because PaCMAP does not support transforming new data without refitting.

Return type:

numpy.ndarray

fit_transform(X, y=None)#

Fit PaCMAP and return the embedding coordinates.

Parameters:
Returns:

Embedded coordinates produced by PaCMAP.

Return type:

np.ndarray of shape (n_samples, n_dims)

class coco_pipe.dim_reduction.reducers.neighbor.TrimapReducer(n_components=2, n_inliers=10, n_outliers=5, n_random=5, **kwargs)#

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

TriMap reducer.

TriMap uses triplet constraints to preserve relative similarities while emphasizing global layout preservation.

Parameters:
  • n_components (int, default=2) – Number of embedding dimensions.

  • n_inliers (int, default=10) – Number of nearest-neighbor inlier triplets.

  • n_outliers (int, default=5) – Number of outlier triplets.

  • n_random (int, default=5) – Number of random triplets per sample.

  • **kwargs (dict) – Additional keyword arguments forwarded to trimap.TRIMAP after signature filtering.

Variables:
  • embedding (np.ndarray or None) – Learned training-set embedding after fit or fit_transform.

  • model (trimap.TRIMAP or None) – Fitted TriMap estimator after fit or fit_transform.

Notes

transform is not supported because TriMap does not provide an out-of-sample projection API.

See also

UMAPReducer

Nonlinear graph-based embedding with transform support.

PacmapReducer

Nonlinear embedding balancing local and global structure.

TSNEReducer

Nonlinear neighborhood-preserving visualization method.

PHATEReducer

Diffusion-based embedding for continuous trajectories.

IsomapReducer

Nonlinear geodesic-distance manifold embedding.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import TrimapReducer
>>> X = np.random.rand(100, 10)
>>> reducer = TrimapReducer(n_components=2)
>>> reducer.fit_transform(X).shape
(100, 2)
property capabilities: dict#

Return capability metadata for TriMap.

Returns:

Capability mapping describing TriMap as a nonlinear stochastic reducer without out-of-sample transform support.

Return type:

dict

n_inliers = 10#
n_outliers = 5#
n_random = 5#
embedding_ = None#
fit(X, y=None)#

Fit TriMap on the input data.

Parameters:
Returns:

Fitted reducer instance.

Return type:

TrimapReducer

Raises:
  • ImportError – If trimap is not installed.

  • RuntimeError – If trimap is installed but fails during initialization.

abstractmethod transform(X)#

Raise because TriMap does not support out-of-sample transformation.

Parameters:

X (ArrayLike) – Ignored input included for API compatibility.

Raises:

NotImplementedError – Always raised because TriMap does not support transforming new data without refitting.

Return type:

numpy.ndarray

fit_transform(X, y=None)#

Fit TriMap and return the embedding coordinates.

Parameters:
Returns:

Embedded coordinates produced by TriMap.

Return type:

np.ndarray of shape (n_samples, n_dims)

class coco_pipe.dim_reduction.reducers.neighbor.PHATEReducer(n_components=2, knn=5, decay=40, t='auto', **kwargs)#

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

PHATE reducer.

Potential of Heat-diffusion for Affinity-based Transition Embedding (PHATE) is designed for data with continuous progression structure and uses diffusion-based distances to construct the embedding.

Parameters:
  • n_components (int, default=2) – Number of embedding dimensions.

  • knn (int, default=5) – Number of nearest neighbors used in the kernel graph.

  • decay (int, default=40) – Decay rate for the kernel.

  • t (int or str, default="auto") – Diffusion time.

  • **kwargs (dict) – Additional keyword arguments forwarded to phate.PHATE after signature filtering.

Variables:

model (phate.PHATE or None) – Fitted PHATE estimator after fit.

See also

UMAPReducer

Nonlinear graph-based embedding with transform support.

TSNEReducer

Nonlinear neighborhood-preserving visualization method.

PacmapReducer

Nonlinear embedding balancing local and global structure.

TrimapReducer

Nonlinear triplet-based embedding preserving global layout.

ParametricUMAPReducer

Neural-network-backed UMAP approximation.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import PHATEReducer
>>> X = np.random.rand(100, 10)
>>> reducer = PHATEReducer(n_components=2, knn=5)
>>> _ = reducer.fit(X)
>>> reducer.transform(X[:10]).shape
(10, 2)
>>> reducer.get_diagnostics()["diff_potential"] is not None
True
property capabilities: dict#

Return capability metadata for PHATE.

Returns:

Capability mapping describing PHATE as a nonlinear reducer with transform support and a native plotting path.

Return type:

dict

knn = 5#
decay = 40#
t = 'auto'#
fit(X, y=None)#

Fit PHATE on the input data.

Parameters:
Returns:

Fitted reducer instance.

Return type:

PHATEReducer

Raises:
  • ImportError – If phate is not installed.

  • RuntimeError – If phate is installed but fails during initialization.

transform(X)#

Project data using the fitted PHATE model.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Data to project.

Returns:

Low-dimensional embedding coordinates.

Return type:

np.ndarray of shape (n_samples, n_dims)

Raises:

RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.reducers.neighbor.ParametricUMAPReducer(n_components=2, n_neighbors=15, min_dist=0.1, metric='euclidean', n_epochs=None, batch_size=1000, verbose=False, **kwargs)#

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Parametric UMAP reducer.

Parametric UMAP learns a neural network that approximates the UMAP embedding, enabling reusable out-of-sample projection through the trained network.

Parameters:
  • n_components (int, default=2) – Number of embedding dimensions.

  • n_neighbors (int, default=15) – Size of the local neighborhood.

  • min_dist (float, default=0.1) – Effective minimum distance between embedded points.

  • metric (str, default="euclidean") – Metric used for distance computation.

  • n_epochs (int, optional) – Number of training epochs.

  • batch_size (int, default=1000) – Batch size used during training.

  • verbose (bool, default=False) – Whether to print backend training progress.

  • **kwargs (dict) – Additional keyword arguments forwarded to umap.parametric_umap.ParametricUMAP after signature filtering.

Variables:

model (umap.parametric_umap.ParametricUMAP or None) – Fitted Parametric UMAP estimator after fit.

See also

UMAPReducer

Non-parametric UMAP with graph-based transform support.

TSNEReducer

Nonlinear neighborhood-preserving visualization method.

PHATEReducer

Diffusion-based embedding for continuous trajectories.

IVISReducer

Neural metric-learning-based embedding.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import ParametricUMAPReducer
>>> X = np.random.rand(50, 10).astype(np.float32)
>>> reducer = ParametricUMAPReducer(n_components=2, n_epochs=5, verbose=False)
>>> _ = reducer.fit(X)
>>> reducer.transform(X[:10]).shape
(10, 2)
property capabilities: dict#

Return capability metadata for Parametric UMAP.

Returns:

Capability mapping describing Parametric UMAP as a nonlinear stochastic reducer with transform support.

Return type:

dict

n_neighbors = 15#
min_dist = 0.1#
metric = 'euclidean'#
n_epochs = None#
batch_size = 1000#
verbose = False#
fit(X, y=None)#

Fit Parametric UMAP on the input data.

Parameters:
Returns:

Fitted reducer instance.

Return type:

ParametricUMAPReducer

Raises:
  • ImportError – If umap-learn is not installed.

  • RuntimeError – If umap-learn is installed but fails during initialization.

transform(X)#

Project data using the fitted Parametric UMAP model.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Data to project.

Returns:

Low-dimensional embedding coordinates.

Return type:

np.ndarray of shape (n_samples, n_dims)

Raises:

RuntimeError – If the reducer has not been fitted.

property loss_history_: list#

Training loss history for the parametric model.

Returns:

Recorded loss values across training epochs.

Return type:

list

Raises:

RuntimeError – If the reducer has not been fitted.

save(filepath)#

Serialize the fitted reducer with joblib.

Parameters:

filepath (str) – Output path for the serialized reducer.

Raises:

RuntimeError – If the reducer has not been fitted.

Return type:

None