.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/dim_reduction/plot_01_compare_methods.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_dim_reduction_plot_01_compare_methods.py: ============================================================ Comprehensive Comparison of Dimensionality Reduction Methods ============================================================ This example compares several dimensionality reduction algorithms (PCA, t-SNE, UMAP, and Pacmap) across different parameter settings using a synthetic high-dimensional dataset. It demonstrates how hyperparameter choices can drastically affect the resulting embeddings. .. GENERATED FROM PYTHON SOURCE LINES 13-15 Imports and Setup ----------------- .. GENERATED FROM PYTHON SOURCE LINES 15-34 .. code-block:: Python import os import time import warnings import matplotlib.pyplot as plt from sklearn.datasets import make_classification from coco_pipe.dim_reduction import DimReduction from coco_pipe.viz.dim_reduction import plot_embedding # Prevent multiprocessing segfaults on macOS by limiting threads os.environ["OMP_NUM_THREADS"] = "1" os.environ["LOKY_MAX_CPU_COUNT"] = "1" os.environ["NUMEXPR_MAX_THREADS"] = "1" # Suppress warnings for cleaner output warnings.filterwarnings("ignore") .. GENERATED FROM PYTHON SOURCE LINES 35-39 1. Generate Synthetic Data -------------------------- We create a synthetic dataset with 5 distinct classes embedded in a 50-dimensional space to simulate a complex, high-dimensional classification problem. .. GENERATED FROM PYTHON SOURCE LINES 39-58 .. code-block:: Python n_samples = 2000 n_features = 50 n_classes = 5 X, y = make_classification( n_samples=n_samples, n_features=n_features, n_informative=10, n_redundant=10, n_classes=n_classes, n_clusters_per_class=1, random_state=42, ) print(f"Dataset shape: {X.shape}") print(f"Number of classes: {n_classes}") .. rst-class:: sphx-glr-script-out .. code-block:: none Dataset shape: (2000, 50) Number of classes: 5 .. GENERATED FROM PYTHON SOURCE LINES 59-63 2. Define Methods and Parameters -------------------------------- We will test PCA, t-SNE, UMAP, and Pacmap, varying key parameters like perplexity or number of neighbors to observe their effect on the resulting topology. .. GENERATED FROM PYTHON SOURCE LINES 63-88 .. code-block:: Python method_params = { "PCA": [ ({}, "Default"), ({"svd_solver": "randomized"}, "Randomized SVD"), ({"whiten": True}, "Whitened"), ], "TSNE": [ ({"perplexity": 10, "max_iter": 500}, "perplexity=10"), ({"perplexity": 30, "max_iter": 500}, "perplexity=30"), ({"perplexity": 50, "max_iter": 500}, "perplexity=50"), ], "UMAP": [ ({"n_neighbors": 10, "min_dist": 0.1}, "n_neighbors=10"), ({"n_neighbors": 20, "min_dist": 0.1}, "n_neighbors=20"), ({"n_neighbors": 50, "min_dist": 0.1}, "n_neighbors=50"), ], "Pacmap": [ ({"n_neighbors": 10}, "n_neighbors=10"), ({"n_neighbors": 20}, "n_neighbors=20"), ({"n_neighbors": 50}, "n_neighbors=50"), ], } .. GENERATED FROM PYTHON SOURCE LINES 89-93 3. Compute Embeddings --------------------- We iterate over the methods and their parameter sets, computing the 2D embedding and recording the elapsed time. .. GENERATED FROM PYTHON SOURCE LINES 93-108 .. code-block:: Python results = {method: [] for method in method_params} for method, param_sets in method_params.items(): print(f"Evaluating {method}...") for params, param_str in param_sets: reducer = DimReduction(method=method, n_components=2, **params) start_time = time.time() X_reduced = reducer.fit_transform(X) elapsed = time.time() - start_time results[method].append((X_reduced, param_str, elapsed)) .. rst-class:: sphx-glr-script-out .. code-block:: none Evaluating PCA... Evaluating TSNE... Evaluating UMAP... Evaluating Pacmap... .. GENERATED FROM PYTHON SOURCE LINES 109-113 4. Visualize Grid Comparison ---------------------------- We plot the resulting embeddings in a grid, where columns represent methods and rows represent different parameter settings. .. GENERATED FROM PYTHON SOURCE LINES 113-151 .. code-block:: Python methods = [m for m in method_params if len(results[m]) > 0] n_methods = len(methods) n_params = max(len(results[m]) for m in methods) fig, axes = plt.subplots(n_params, n_methods, figsize=(n_methods * 4, n_params * 4)) for col, method in enumerate(methods): method_results = results[method] for row, (X_red, param_str, elapsed) in enumerate(method_results): ax = axes[row, col] # Use our viz module to plot the embedding plot_embedding( X_red, labels=y, ax=ax, s=5, alpha=0.6, palette="tab10", title=f"{method}" if row == 0 else "", label_kind="categorical", ) ax.set_xlabel(f"{param_str}\n({elapsed:.2f}s)", fontsize=10) ax.set_xticks([]) ax.set_yticks([]) # Fill empty subplots if any for method in methods: n_results = len(results[method]) for row in range(n_results, n_params): axes[row, methods.index(method)].axis("off") plt.suptitle("Dimension Reduction Methods Comparison", fontsize=18, y=1.02) plt.tight_layout() plt.show() .. image-sg:: /auto_examples/dim_reduction/images/sphx_glr_plot_01_compare_methods_001.png :alt: Dimension Reduction Methods Comparison, PCA, TSNE, UMAP, Pacmap :srcset: /auto_examples/dim_reduction/images/sphx_glr_plot_01_compare_methods_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 152-164 Conclusion ---------- This comprehensive comparison illustrates that: 1. **PCA** provides a rapid baseline but struggles to separate complex non-linear structures. 2. **t-SNE** creates beautiful, distinct clusters but is highly sensitive to the perplexity parameter and takes longer to compute. 3. **UMAP** effectively balances local and global structure preservation while remaining computationally efficient. 4. **Pacmap** aims to preserve both local and global structures simultaneously, often rivaling UMAP in speed and cluster quality. .. rst-class:: sphx-glr-timing **Total running time of the script:** (1 minutes 11.261 seconds) .. _sphx_glr_download_auto_examples_dim_reduction_plot_01_compare_methods.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_01_compare_methods.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_01_compare_methods.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_01_compare_methods.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_