coco_pipe.dim_reduction.evaluation.MethodSelector#

class coco_pipe.dim_reduction.evaluation.MethodSelector(reducers)#

Bases: object

Compare and rank already-scored dimensionality reduction methods.

MethodSelector is intentionally post-hoc. It does not fit reducers or compute embeddings. Each reducer must already be a scored ~coco_pipe.dim_reduction.DimReduction instance with cached metric_records_.

Parameters:

reducers (dict or list of DimReduction) – Scored ~coco_pipe.dim_reduction.DimReduction objects to compare. Lists are converted to a method-keyed mapping using reducer.method.

Variables:
  • reducers (dict of str to DimReduction) – Compared reductions keyed by method name.

  • metric_records (list of dict) – Cached long-form metric records populated by collect().

See also

evaluate_embedding

Pure evaluator used upstream by DimReduction.score.

coco_pipe.dim_reduction.core.DimReduction.score

Scores a fitted reduction and populates the records consumed here.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import DimReduction
>>> X = np.random.RandomState(0).randn(30, 4)
>>> reducers = [
...     DimReduction("PCA", n_components=2),
...     DimReduction("Isomap", n_components=2, n_neighbors=5),
... ]
>>> for reducer in reducers:
...     embedding = reducer.fit_transform(X)
...     reducer.score(embedding, X=X, k_values=[5])
>>> selector = MethodSelector(reducers).collect()
>>> frame = selector.to_frame()
>>> not frame.empty
True
classmethod from_records(records)#

Create a selector directly from long-form metric records.

Parameters:

records (list[dict[str, Any]])

Return type:

MethodSelector

classmethod from_frame(frame)#

Create a selector directly from a metric-record DataFrame.

Parameters:

frame (DataFrame)

Return type:

MethodSelector

collect()#

Collect cached metric records from already-scored reducers.

Returns:

The selector populated with comparison-ready metric records.

Return type:

MethodSelector

Raises:

ValueError – If a reducer has not been scored yet.

See also

coco_pipe.dim_reduction.core.DimReduction.score

Populates the metric_records_ consumed by this method.

to_frame

Materialize the collected long-form records as a DataFrame.

Notes

collect() does not fit reducers or recompute evaluation metrics. It only gathers cached metric observations from reducers that were already scored explicitly.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import DimReduction
>>> X = np.random.RandomState(0).randn(20, 4)
>>> reducer = DimReduction("PCA", n_components=2)
>>> embedding = reducer.fit_transform(X)
>>> reducer.score(embedding, X=X, k_values=[5])
>>> selector = MethodSelector([reducer]).collect()
>>> len(selector.metric_records_) > 0
True
to_frame()#

Return the cached long-form metric table.

Returns:

Tidy metric table with columns method, metric, value, scope, and scope_value.

Return type:

pandas.DataFrame

Notes

This method only materializes a DataFrame at the public export boundary. Internally, MethodSelector stores metric records as plain Python dictionaries.

See also

collect

Gather cached metric records from scored reducers.

rank_methods

Rank reducers from the collected metric table.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import DimReduction
>>> X = np.random.RandomState(0).randn(20, 4)
>>> reducer = DimReduction("PCA", n_components=2)
>>> embedding = reducer.fit_transform(X)
>>> reducer.score(embedding, X=X, k_values=[5])
>>> frame = MethodSelector([reducer]).collect().to_frame()
>>> set(["method", "metric", "value"]).issubset(frame.columns)
True
rank_methods(selection_metric, *, selection_k=None, tie_breakers=None)#

Rank methods using one primary metric and optional tie-breakers.

Parameters:
  • selection_metric (str) – Metric to optimize.

  • selection_k (int, optional) – Neighborhood size to compare for k-scoped metrics.

  • tie_breakers (sequence of str, optional) – Additional metrics used in order when primary values tie.

Returns:

Ranked comparison table. The first row is the best-scoring method under the requested ranking policy.

Return type:

pandas.DataFrame

Raises:

ValueError – If the requested metrics are unsupported, unavailable in the cached records, or missing the requested selection_k observations.

Notes

Ranking is based on mean metric values per method. For k-scoped metrics, selection_k restricts comparison to a single neighborhood size when requested.

See also

collect

Gather cached metric observations before ranking.

to_frame

Inspect the underlying long-form metric observations directly.

coco_pipe.dim_reduction.core.DimReduction.score

Produces the metric records that feed into ranking.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import DimReduction
>>> X = np.random.RandomState(0).randn(20, 4)
>>> reducers = [DimReduction("PCA", n_components=2)]
>>> reducer = reducers[0]
>>> embedding = reducer.fit_transform(X)
>>> reducer.score(embedding, X=X, k_values=[5])
>>> ranked = (
...     MethodSelector(reducers)
...     .collect()
...     .rank_methods(
...         "trustworthiness",
...         selection_k=5,
...     )
... )
>>> ranked.iloc[0]["method"] == reducer.method
True