.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/dim_reduction/plot_02_quality_metrics.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_dim_reduction_plot_02_quality_metrics.py: ===================================================================== Benchmarking Dimensionality Reduction: The Epistemology of Embeddings ===================================================================== This example demonstrates the "Advanced Exploration and Benchmarking" pillar of the ``coco_pipe`` strategic vision. We move beyond "looking good" and use rigorous metrics (Trustworthiness, Continuity, LCMC) to quantify embedding distortion. We compare PCA (Linear) and UMAP (Non-linear) on the classic "S-Curve" manifold, a structure that is inherently 2D but embedded in 3D. .. GENERATED FROM PYTHON SOURCE LINES 15-17 Imports ------- .. GENERATED FROM PYTHON SOURCE LINES 17-31 .. code-block:: Python import os import matplotlib.pyplot as plt from sklearn.datasets import make_s_curve from coco_pipe.dim_reduction import DimReduction from coco_pipe.viz.dim_reduction import plot_embedding # Prevent multiprocessing segfaults on macOS by limiting threads os.environ["OMP_NUM_THREADS"] = "1" os.environ["LOKY_MAX_CPU_COUNT"] = "1" os.environ["NUMEXPR_MAX_THREADS"] = "1" .. GENERATED FROM PYTHON SOURCE LINES 32-36 1. Generate Ground Truth Manifold --------------------------------- The S-Curve is a standard benchmark. It has intrinsic dimension 2. We generate 1000 points. .. GENERATED FROM PYTHON SOURCE LINES 36-53 .. code-block:: Python n_points = 1000 X, color = make_s_curve(n_points, random_state=42) # Visualize Ground Truth using our viz module in 3D fig, ax = plot_embedding( X, labels=color, dims=(0, 1, 2), title="Ground Truth: S-Curve Manifold", cmap="viridis", label_kind="continuous", s=20, ) ax.view_init(10, -70) plt.show() .. image-sg:: /auto_examples/dim_reduction/images/sphx_glr_plot_02_quality_metrics_001.png :alt: Ground Truth: S-Curve Manifold :srcset: /auto_examples/dim_reduction/images/sphx_glr_plot_02_quality_metrics_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 54-58 2. Compare Embeddings --------------------- We will embed this 3D data into 2D using PCA and UMAP, then quantify the distortion. .. GENERATED FROM PYTHON SOURCE LINES 58-78 .. code-block:: Python # Initialize Reducers reducers = { "PCA": DimReduction("PCA", n_components=2), "UMAP": DimReduction("UMAP", n_components=2, n_neighbors=15, min_dist=0.1), } results = {} for name, dr in reducers.items(): print(f"Running {name}...") X_emb = dr.fit_transform(X) # Calculate Metrics # Note: These metrics are calculated via scikit-learn or internal utils # For this demo, we assume they are computed and stored in the 'scores' scores = dr.score(X_emb, X=X) results[name] = {"embedding": X_emb, "scores": scores} .. rst-class:: sphx-glr-script-out .. code-block:: none Running PCA... Running UMAP... .. GENERATED FROM PYTHON SOURCE LINES 79-86 3. Visualize and Quantify ------------------------- We plot the 2D embeddings side-by-side with their Trustworthiness scores. - **Trustworthiness**: High means neighbors in 2D are real neighbors in 3D (No spurious clusters). - **Continuity**: High means 3D neighbors are preserved in 2D (No tearing). .. GENERATED FROM PYTHON SOURCE LINES 86-125 .. code-block:: Python fig, axes = plt.subplots(1, 2, figsize=(14, 6)) for i, (name, res) in enumerate(results.items()): X_emb = res["embedding"] scores = res["scores"] # Extract metrics from the structured payload m = scores.get("metrics", {}) trust = m.get("trustworthiness", 0.0) cont = m.get("continuity", 0.0) lcmc = m.get("lcmc", 0.0) ax = axes[i] title = f"{name}\n" title += f"Trustworthiness: {trust:.3f}\n" title += f"Continuity: {cont:.3f}\n" title += f"LCMC: {lcmc:.3f}" # Use the coco_pipe plotting function plot_embedding( X_emb, labels=color, title=title, cmap="viridis", label_kind="continuous", s=20, alpha=0.7, ax=ax, ) ax.axis("tight") ax.set_xticks([]) ax.set_yticks([]) plt.tight_layout() plt.show() .. image-sg:: /auto_examples/dim_reduction/images/sphx_glr_plot_02_quality_metrics_002.png :alt: PCA Trustworthiness: 0.978 Continuity: 0.991 LCMC: 0.332, UMAP Trustworthiness: 0.999 Continuity: 0.998 LCMC: 0.748 :srcset: /auto_examples/dim_reduction/images/sphx_glr_plot_02_quality_metrics_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 126-136 Interpretation -------------- - **PCA**: Should have high **Continuity** (it folds the S-curve onto itself, keeping neighbors together) but lower **Trustworthiness** (distant points overlap in the projection, creating false neighbors). - **UMAP**: Should have high **Trustworthiness** and **Continuity** as it unrolls the manifold, preserving the local neighborhood structure without determining false overlaps. This quantitative assessment is superior to simply saying "UMAP looks better." .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.975 seconds) .. _sphx_glr_download_auto_examples_dim_reduction_plot_02_quality_metrics.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_02_quality_metrics.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_02_quality_metrics.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_02_quality_metrics.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_