coco_pipe.dim_reduction.evaluation.MethodSelector#

class coco_pipe.dim_reduction.evaluation.MethodSelector(reducers)#

Bases: object

Compare and rank already-scored dimensionality reduction methods.

MethodSelector is intentionally post-hoc. It does not fit reducers or compute embeddings. Each reducer must already be a scored ~coco_pipe.dim_reduction.DimReduction instance with cached metric_records_.

Parameters:

reducers (dict or list of DimReduction) – Scored ~coco_pipe.dim_reduction.DimReduction objects to compare. Lists are converted to a method-keyed mapping using reducer.method.

Variables:

reducers (dict of str to DimReduction) – Compared reductions keyed by method name.
metric_records (list of dict) – Cached long-form metric records populated by collect().

See also

evaluate_embedding: Pure evaluator used upstream by DimReduction.score.
coco_pipe.dim_reduction.core.DimReduction.score: Scores a fitted reduction and populates the records consumed here.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import DimReduction
>>> X = np.random.RandomState(0).randn(30, 4)
>>> reducers = [
...     DimReduction("PCA", n_components=2),
...     DimReduction("Isomap", n_components=2, n_neighbors=5),
... ]
>>> for reducer in reducers:
...     embedding = reducer.fit_transform(X)
...     reducer.score(embedding, X=X, k_values=[5])
>>> selector = MethodSelector(reducers).collect()
>>> frame = selector.to_frame()
>>> not frame.empty
True

classmethod from_records(records)#

Create a selector directly from long-form metric records.

Parameters:: records (list[dict[str, Any]])
Return type:: MethodSelector

classmethod from_frame(frame)#

Create a selector directly from a metric-record DataFrame.

Parameters:: frame (DataFrame)
Return type:: MethodSelector

collect()#

Collect cached metric records from already-scored reducers.

Returns:: The selector populated with comparison-ready metric records.
Return type:: MethodSelector
Raises:: ValueError – If a reducer has not been scored yet.

See also

coco_pipe.dim_reduction.core.DimReduction.score: Populates the metric_records_ consumed by this method.
to_frame: Materialize the collected long-form records as a DataFrame.

Notes

collect() does not fit reducers or recompute evaluation metrics. It only gathers cached metric observations from reducers that were already scored explicitly.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import DimReduction
>>> X = np.random.RandomState(0).randn(20, 4)
>>> reducer = DimReduction("PCA", n_components=2)
>>> embedding = reducer.fit_transform(X)
>>> reducer.score(embedding, X=X, k_values=[5])
>>> selector = MethodSelector([reducer]).collect()
>>> len(selector.metric_records_) > 0
True

to_frame()#

Return the cached long-form metric table.

Returns:: Tidy metric table with columns method, metric, value, scope, and scope_value.
Return type:: pandas.DataFrame

Notes

This method only materializes a DataFrame at the public export boundary. Internally, MethodSelector stores metric records as plain Python dictionaries.

See also

collect: Gather cached metric records from scored reducers.
rank_methods: Rank reducers from the collected metric table.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import DimReduction
>>> X = np.random.RandomState(0).randn(20, 4)
>>> reducer = DimReduction("PCA", n_components=2)
>>> embedding = reducer.fit_transform(X)
>>> reducer.score(embedding, X=X, k_values=[5])
>>> frame = MethodSelector([reducer]).collect().to_frame()
>>> set(["method", "metric", "value"]).issubset(frame.columns)
True

rank_methods(selection_metric, *, selection_k=None, tie_breakers=None)#

Rank methods using one primary metric and optional tie-breakers.

Parameters:

selection_metric (str) – Metric to optimize.
selection_k (int, optional) – Neighborhood size to compare for k-scoped metrics.
tie_breakers (sequence of str, optional) – Additional metrics used in order when primary values tie.

Returns:

Ranked comparison table. The first row is the best-scoring method under the requested ranking policy.

Return type:

pandas.DataFrame

Raises:

ValueError – If the requested metrics are unsupported, unavailable in the cached records, or missing the requested selection_k observations.

Notes

Ranking is based on mean metric values per method. For k-scoped metrics, selection_k restricts comparison to a single neighborhood size when requested.

See also

collect: Gather cached metric observations before ranking.
to_frame: Inspect the underlying long-form metric observations directly.
coco_pipe.dim_reduction.core.DimReduction.score: Produces the metric records that feed into ranking.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import DimReduction
>>> X = np.random.RandomState(0).randn(20, 4)
>>> reducers = [DimReduction("PCA", n_components=2)]
>>> reducer = reducers[0]
>>> embedding = reducer.fit_transform(X)
>>> reducer.score(embedding, X=X, k_values=[5])
>>> ranked = (
...     MethodSelector(reducers)
...     .collect()
...     .rank_methods(
...         "trustworthiness",
...         selection_k=5,
...     )
... )
>>> ranked.iloc[0]["method"] == reducer.method
True