.. _dim-reduction-guide:

===============================
Core Workflow and Configuration
===============================

.. _dim-reduction-core:

``DimReduction`` Manager
========================

:class:`~coco_pipe.dim_reduction.DimReduction` is the single public manager
class. It wraps one reducer, drives ``fit`` / ``transform``, delegates scoring
to the pure evaluator, runs interpretation analyses, and exposes a consolidated
``get_summary()`` payload.

The manager does **not** cache embeddings. Every method that needs an embedding
takes it as an explicit argument.

---

1. Construction
---------------

.. code-block:: python

   from coco_pipe.dim_reduction import DimReduction
   from coco_pipe.dim_reduction.config import UMAPConfig

   # By method name
   reducer = DimReduction("UMAP", n_components=2, n_neighbors=15)

   # With a typed config
   reducer = DimReduction(UMAPConfig(n_components=2, n_neighbors=15))

Resolution order for reducer kwargs (later overrides earlier):

1. Fields from the config (if a :class:`BaseReducerConfig` is passed).
2. The ``params=`` dictionary.
3. Additional ``**kwargs``.

---

2. Lifecycle Methods
--------------------

.. code-block:: python

   reducer.fit(X)                        # fit only
   embedding = reducer.transform(X)      # transform an already-fitted reducer
   embedding = reducer.fit_transform(X)  # combined

All three return ``self`` / a NumPy array as appropriate. They reset cached
``metrics_``, ``diagnostics_``, ``metric_records_``, and ``interpretation_``
so stale evaluations cannot leak across runs.

``DimReduction.get_components()`` returns reducer components for linear
methods (PCA, IncrementalPCA, DaskPCA, DaskTruncatedSVD). It raises for
methods that do not expose components.

---

3. Scoring
----------

.. code-block:: python

   scores = reducer.score(
       embedding,                        # required, explicit
       X=X,                              # required for 2D metrics
       n_neighbors=5,                    # single-score neighborhood
       metrics=["trustworthiness", "continuity"],
       k_values=[5, 10, 20],             # multi-scale sweep
       labels=labels,                    # used for trajectory_separation
       groups=groups,                    # for supervised separation metrics
       times=times,                      # for trajectory AUC
       separation_method="centroid",
   )

Returns a dict with three keys:

.. list-table::
   :header-rows: 0
   :widths: 20 80

   * - ``metrics``
     - Scalar metric summary (also cached on ``metrics_``).
   * - ``metadata``
     - Scalar descriptive metadata (also cached on ``quality_metadata_``).
   * - ``diagnostics``
     - Array / structured diagnostics (e.g., ``shepard_distances``,
       ``coranking_matrix``, trajectory timecourses) — also cached on
       ``diagnostics_``.

Tidy long-form records are cached on ``metric_records_`` for downstream
ranking and reporting.

.. admonition:: 2D vs. 3D shape routing

   The evaluator chooses standard or trajectory metrics from ``embedding.shape``,
   not from the reducer name. Pass a ``(n_samples, n_components)`` embedding
   for standard metrics; pass a ``(n_trajectories, n_times, n_dims)`` tensor
   for trajectory metrics. See :ref:`dim-reduction-trajectories`.

---

4. Interpretation
-----------------

.. code-block:: python

   result = reducer.interpret(
       X,
       X_emb=embedding,
       analyses=["correlation", "perturbation", "gradient"],
       feature_names=feature_names,
       n_repeats=5,
       random_state=42,
   )

Returns ``{"analysis": ..., "records": [...]}``. The ``analysis`` payload is
keyed by analysis name; ``records`` is tidy long-form ready for plotting and
reports. Both are cached on ``interpretation_`` and ``interpretation_records_``.

Supported analyses:

- ``"correlation"`` — Spearman correlations between input features and embedding
  axes. Works for any reducer.
- ``"perturbation"`` — model-agnostic feature importance from per-feature
  shuffling. Requires a fitted reducer with ``transform``.
- ``"gradient"`` — encoder saliency for supported torch-based reducers
  (``IVIS``, ``ParametricUMAP``, ``TopologicalAE``).

See :ref:`dim-reduction-interpretation` for the math and reducer requirements.

---

5. Inspecting Cached State
--------------------------

.. code-block:: python

   reducer.get_metrics()           # scalar metrics_
   reducer.get_quality_metadata()  # metadata from reducer + evaluator
   reducer.get_diagnostics()       # full diagnostics_ payload
   reducer.get_summary()           # combined: metrics + metadata + diagnostics
                                   # + metric_records + interpretation + capabilities

``get_summary()`` is the canonical input for :meth:`coco_pipe.report.Report.add_reduction`
and is JSON-serializable. **It deliberately does not carry an embedding** —
pass embeddings explicitly to plotting and reporting paths that need them.

---

6. Capabilities
---------------

.. code-block:: python

   caps = reducer.capabilities
   # {'is_linear': True, 'has_components': True, 'has_loss_history': False, ...}

The manager exposes the reducer's capability dict directly. Capabilities are
used by:

- the evaluator to skip metrics the reducer cannot support,
- :func:`coco_pipe.viz.dim_reduction.plot_loss_history` to detect available
  loss curves,
- :class:`~coco_pipe.dim_reduction.evaluation.MethodSelector` for capability-aware
  filtering in comparison tables.

---

7. Persistence
--------------

.. code-block:: python

   reducer.save("models/umap.pkl")
   loaded = DimReduction.load("models/umap.pkl", method="UMAP")

``save`` writes the fitted reducer with its kwargs and method name; ``load``
re-instantiates the manager and restores the reducer's fitted state. Cached
evaluation payloads are **not** persisted — re-run ``score()`` on the
embedding you have.

---

8. End-to-End Skeleton
----------------------

.. code-block:: python

   from coco_pipe.dim_reduction import DimReduction
   from coco_pipe.viz import plot_embedding, plot_metrics

   reducer = DimReduction("UMAP", n_components=2, n_neighbors=15, random_state=42)
   embedding = reducer.fit_transform(X)
   reducer.score(embedding, X=X, k_values=[5, 10, 20])
   reducer.interpret(X, X_emb=embedding, analyses=["correlation"],
                     feature_names=feature_names)
   summary = reducer.get_summary()

   plot_embedding(embedding, labels=class_ids)
   plot_metrics(reducer)                    # accepts the manager directly

For a multi-reducer comparison, see
:class:`~coco_pipe.dim_reduction.evaluation.MethodSelector` in
:ref:`dim-reduction-evaluation`.

.. _dim-reduction-configs:

Configuration Reference
=======================

Every reducer in ``coco_pipe.dim_reduction`` accepts either keyword arguments
to :class:`~coco_pipe.dim_reduction.DimReduction` *or* a typed pydantic config.
Configs validate field names, types, and ranges at parse time, so typos and
incompatible options fail before any data is touched.

---

1. The Two Equivalent Construction Styles
-----------------------------------------

.. code-block:: python

   from coco_pipe.dim_reduction import DimReduction
   from coco_pipe.dim_reduction.config import UMAPConfig

   # Keyword-style (string method name + kwargs)
   reducer = DimReduction("UMAP", n_components=2, n_neighbors=15, min_dist=0.1)

   # Config-style (typed pydantic model)
   reducer = DimReduction(UMAPConfig(n_components=2, n_neighbors=15, min_dist=0.1))

Prefer the config style when:

- Reducer parameters come from a YAML or JSON file (pass via ``**config``).
- You want a single object that can be serialized, logged, or reused across
  experiments.
- You want strict validation of every field at construction time.

Use the keyword style for one-off scripts and exploration.

---

2. Base Config
--------------

.. code-block:: python

   from coco_pipe.dim_reduction.config import BaseReducerConfig

All reducer configs inherit from :class:`BaseReducerConfig`. Common fields:

- ``method`` — canonical reducer name (``Literal``, immutable per subclass).
- ``n_components`` — target dimensionality, default 2.

A second mixin, :class:`StochasticReducerConfig`, adds a ``random_state``
field (default 42) for reducers that have a seed.

---

3. Reducer Configs
------------------

The full set of typed configs maps 1:1 with the registry in
:data:`coco_pipe.dim_reduction.config.METHODS`.

.. list-table::
   :header-rows: 1
   :widths: 22 28 50

   * - Family
     - Config class
     - Key fields
   * - **Linear**
     - ``PCAConfig``
     - ``whiten``, ``svd_solver``
   * -
     - ``IncrementalPCAConfig``
     - ``batch_size``, ``whiten``
   * -
     - ``DaskPCAConfig``
     - ``svd_solver``
   * -
     - ``DaskTruncatedSVDConfig``
     - ``algorithm``
   * - **Manifold**
     - ``IsomapConfig``
     - ``n_neighbors``, ``metric``, ``p``
   * -
     - ``LLEConfig``
     - ``n_neighbors``, ``lle_method``
   * -
     - ``MDSConfig``
     - ``metric``, ``n_init``, ``dissimilarity``
   * -
     - ``SpectralEmbeddingConfig``
     - ``affinity``, ``gamma``
   * - **Neighbor**
     - ``TSNEConfig``
     - ``perplexity``, ``early_exaggeration``, ``learning_rate``, ``max_iter``, ``init``
   * -
     - ``UMAPConfig``
     - ``n_neighbors``, ``min_dist``, ``metric``, ``spread``
   * -
     - ``ParametricUMAPConfig``
     - ``n_neighbors``, ``min_dist``, ``batch_size``
   * -
     - ``PacmapConfig``
     - ``n_neighbors``, ``MN_ratio``, ``FP_ratio``, ``nn_backend``, ``init``
   * -
     - ``TrimapConfig``
     - ``n_inliers``, ``n_outliers``, ``n_random``
   * -
     - ``PHATEConfig``
     - ``knn``, ``decay``, ``t``
   * - **Spatiotemporal**
     - ``DMDConfig``
     - ``tlsq_rank``, ``exact``, ``opt``
   * -
     - ``TRCAConfig``
     - ``sfreq``, ``filterbank``
   * - **Neural / Topology**
     - ``IVISConfig``
     - ``k``, ``model``, ``n_epochs_without_progress``
   * -
     - ``TopologicalAEConfig``
     - ``hidden_dims``, ``lam``, ``lr``, ``batch_size``, ``epochs``, ``device``

See :ref:`dim-reduction-reducers` for what each reducer does and when to use
it.

3.1 Example: full UMAP config
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from coco_pipe.dim_reduction.config import UMAPConfig

   config = UMAPConfig(
       n_components=2,
       n_neighbors=15,
       min_dist=0.1,
       metric="euclidean",
       spread=1.0,
       random_state=42,
   )

3.2 Example: LLE name renaming
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The pydantic ``method`` field is reserved for reducer selection, so LLE's
sklearn parameter ``method`` is exposed as ``lle_method`` and renamed back
via :meth:`LLEConfig.to_reducer_kwargs`:

.. code-block:: python

   from coco_pipe.dim_reduction.config import LLEConfig

   config = LLEConfig(n_components=2, n_neighbors=10, lle_method="hessian")
   # to_reducer_kwargs() yields {"n_components": 2, "n_neighbors": 10, "method": "hessian"}

---

4. Evaluation Config
--------------------

.. code-block:: python

   from coco_pipe.dim_reduction.config import EvaluationConfig

   eval_config = EvaluationConfig(
       metrics=["trustworthiness", "continuity", "lcmc"],
       k_range=[5, 10, 20, 50, 100],
       selection_metric="trustworthiness",
       selection_k=10,
       tie_breakers=["continuity"],
       separation_method="centroid",
   )

Fields:

================================  ==================================================
``metrics``                       Metric families to compute. Must be canonical
                                  evaluator names (see :ref:`dim-reduction-evaluation`).
                                  No duplicates; at least one entry.
``k_range``                       Neighborhood sizes for multi-scale metrics
                                  (``trustworthiness``, ``continuity``, ``lcmc``,
                                  ``mrre_*``). Positive integers, no duplicates.
``selection_metric``              Primary ranking metric. Must be in
                                  ``_VALID_RANKING_METRICS`` *and* in
                                  ``metrics``.
``selection_k``                   Neighborhood size used when ranking a
                                  ``k``-scoped metric.
``tie_breakers``                  Ordered list of additional ranking metrics.
                                  Each must also be present in ``metrics``.
``separation_method``             Separation definition for trajectory
                                  separation: ``"centroid"`` (default),
                                  ``"within_between_ratio"``, ``"mahalanobis"``,
                                  ``"distributional"``, ``"margin"``.
================================  ==================================================

.. admonition:: Early validation pays off

   ``EvaluationConfig`` rejects unknown metric names, duplicate entries,
   invalid separation methods, and ranking metrics that are not in ``metrics``.
   You won't run a 10-minute scoring loop only to find the ranker has nothing
   to rank with.

---

5. Configs from YAML / JSON
---------------------------

All configs are standard pydantic models, so loading from a serialized form is
direct:

.. code-block:: python

   import yaml
   from coco_pipe.dim_reduction.config import UMAPConfig

   with open("umap.yaml") as f:
       data = yaml.safe_load(f)
   config = UMAPConfig(**data)             # validation runs here

   reducer = DimReduction(config)