.. _decoding-guide: ================================ Building and Running Experiments ================================ .. _decoding-experiment: The ``Experiment`` Orchestrator =============================== ``coco_pipe.decoding.Experiment`` is the main entry point for all decoding experiments. It validates configuration, orchestrates the outer CV loop, and returns a fully populated ``ExperimentResult``. --- 1. Initialization ----------------- .. code-block:: python from coco_pipe.decoding import Experiment, ExperimentConfig from coco_pipe.decoding.configs import ClassicalModelConfig, CVConfig config = ExperimentConfig( task="classification", models={"lr": ClassicalModelConfig(estimator="LogisticRegression")}, metrics=["accuracy"], cv=CVConfig(strategy="stratified", n_splits=5), ) exp = Experiment(config) At construction time, ``Experiment.__init__`` immediately: 1. Resolves all model specs from ``ESTIMATOR_SPECS``. 2. Validates task/metric/model compatibility (raises ``ValueError`` if any combination is invalid). 3. Propagates the master ``random_state`` to all sub-configs. --- 2. Running an Experiment ------------------------ .. code-block:: python result = exp.run( X, y, groups=None, # or np.ndarray of group labels sample_ids=None, # or array of unique sample identifiers sample_metadata=None, # or dict/DataFrame with Subject, Session, ... feature_names=None, # or list of feature name strings time_axis=None, # or np.ndarray of timepoints for 3D inputs observation_level="epoch", # or "trial", "subject", etc. inferential_unit=None, # auto-inferred from metadata ) 2.1 ``X`` and ``y`` ~~~~~~~~~~~~~~~~~~~ - ``X``: 2D array ``(n_samples, n_features)`` for classical models, or 3D array ``(n_samples, n_channels, n_times)`` for temporal estimators. - ``y``: 1D array ``(n_samples,)`` of class labels (classification) or continuous values (regression). 2.2 ``sample_metadata`` ~~~~~~~~~~~~~~~~~~~~~~~ A dict or DataFrame with columns for each metadata variable. **Must include ``Subject`` and ``Session``** (capitalized) when the outer CV uses a group key. Additional columns (e.g., ``Site``, ``Age``) are stored in predictions and splits for downstream analysis. .. code-block:: python sample_metadata = { "Subject": subject_ids, # unique subject identifiers "Session": session_ids, # recording session identifiers "Site": site_ids, # optional acquisition site } 2.3 ``observation_level`` ~~~~~~~~~~~~~~~~~~~~~~~~~ A string label stored in ``result.meta["observation_level"]``. It describes what each row of ``X`` represents (``"epoch"``, ``"trial"``, ``"subject"``, etc.). This metadata does not affect fitting but documents the result for downstream analysis and reporting. --- 3. Per-Fold Pipeline -------------------- For each outer CV fold, ``Experiment`` executes the following sequence: 1. **Split**: divide ``X``, ``y``, and metadata into training and test partitions. 2. **Validate fold integrity**: check for degenerate folds (empty partitions, single-class training sets for classification). 3. **Build pipeline**: create a ``sklearn.pipeline.Pipeline`` with steps: ``scaler → feature_selector → model``. Each step is instantiated fresh for this fold. 4. **Wrap with tuning**: if ``TuningConfig.enabled``, wrap the pipeline in ``GridSearchCV`` or ``RandomizedSearchCV``. 5. **Fit**: call ``pipeline.fit(X_train, y_train)`` (with groups if required). 6. **Calibrate**: if ``CalibrationConfig.enabled``, wrap in ``CalibratedClassifierCV`` and refit with calibration folds. 7. **Score**: compute all requested metrics on ``X_test``. 8. **Extract diagnostics**: feature importances, predictions, timing, warnings. --- 4. Parallel Execution --------------------- .. code-block:: python config = ExperimentConfig( ..., n_jobs=4, # number of parallel outer CV jobs ) result = Experiment(config).run(X, y) ``n_jobs`` controls the number of parallel outer-fold evaluations via ``joblib``. For exact reproducibility, use ``n_jobs=1`` (see :ref:`decoding-reproducibility`). --- 5. Save and Load ---------------- .. code-block:: python # Save result to JSON path = result.save("results/my_experiment.json") # Load from JSON from coco_pipe.decoding.result import ExperimentResult loaded = ExperimentResult.load(path) The result is serialized as a self-contained JSON payload (schema version ``decoding_result_v1``), including the config, metadata, per-fold outputs, and provenance information. --- 6. Configuration Reference -------------------------- See :ref:`decoding-configs` for a full listing of all configuration classes. The most important fields on ``ExperimentConfig``: .. list-table:: :header-rows: 1 :widths: 25 75 * - Field - Description * - ``task`` - ``"classification"`` or ``"regression"``. * - ``models`` - Dict mapping model names to model configs. * - ``metrics`` - List of metric keys (validated against the task and model capabilities). * - ``cv`` - ``CVConfig`` controlling the outer cross-validation loop. * - ``tuning`` - ``TuningConfig`` for hyperparameter search. * - ``feature_selection`` - ``FeatureSelectionConfig`` for filter/wrapper feature selection. * - ``calibration`` - ``CalibrationConfig`` for probability calibration. * - ``evaluation`` - ``StatisticalAssessmentConfig`` for permutation/binomial testing. * - ``use_scaler`` - Whether to prepend a ``StandardScaler`` to the pipeline. * - ``n_jobs`` - Number of parallel outer CV jobs. * - ``random_state`` - Master seed for reproducibility. * - ``tag`` - Descriptive label stored in the result metadata. .. _decoding-configs: Configuration Reference ======================= All experiment configuration is declarative and Pydantic-validated. Every config class uses ``extra="forbid"`` so misspelled or unsupported field names raise a ``ValidationError`` immediately — before any training starts. --- 1. ``ExperimentConfig`` ----------------------- Top-level configuration for a decoding experiment. .. code-block:: python from coco_pipe.decoding.configs import ExperimentConfig config = ExperimentConfig( task="classification", # required: "classification" or "regression" models={"lr": ...}, # required: dict of model configs metrics=["accuracy"], # default: task-appropriate defaults cv=CVConfig(...), # default: StratifiedKFold(5) tuning=TuningConfig(...), # default: disabled feature_selection=FeatureSelectionConfig(...), # default: disabled reducer=ReducerConfig(...), # default: disabled (in-pipeline reduction) calibration=CalibrationConfig(...), # default: disabled statistical_assessment=StatisticalAssessmentConfig(...), # default: disabled grids={"lr": {"C": [0.1, 1.0]}}, # hyperparameter grids for tuning use_scaler=True, # prepend StandardScaler to pipeline n_jobs=1, # outer CV parallelism verbose=False, tag="my_experiment", # descriptive label in result metadata random_state=42, ) --- 2. ``CVConfig`` --------------- Controls the outer cross-validation loop. .. code-block:: python from coco_pipe.decoding.configs import CVConfig cv = CVConfig( strategy="stratified_group_kfold", n_splits=5, # also the number of groups left out for "leave_p_out" group_key="Subject", # column name in sample_metadata test_size=0.2, # for "split" / "group_shuffle_split" only stratify=True, # for "split" + classification only auto_reduce_n_splits=True, # shrink n_splits if too few groups random_state=42, ) See :ref:`decoding-cv` for a complete strategy guide. --- 3. ``ClassicalModelConfig`` --------------------------- Configures a classical scikit-learn estimator. .. code-block:: python from coco_pipe.decoding.configs import ClassicalModelConfig model = ClassicalModelConfig( estimator="LogisticRegression", # key in ESTIMATOR_SPECS params={"C": 1.0, "max_iter": 200}, ) Short-form aliases are also available for common estimators: .. code-block:: python from coco_pipe.decoding.configs import LogisticRegressionConfig model = LogisticRegressionConfig(C=1.0, max_iter=200) --- 4. ``TemporalDecoderConfig`` ---------------------------- Wraps a classical base estimator for 3D temporal inputs. .. code-block:: python from coco_pipe.decoding.configs import TemporalDecoderConfig, ClassicalModelConfig model = TemporalDecoderConfig( wrapper="sliding", # or "generalizing" base=ClassicalModelConfig(estimator="LogisticRegression"), scoring="accuracy", n_jobs=-1, ) Requires ``mne`` as an optional dependency. --- 5. ``TuningConfig`` ------------------- Controls hyperparameter search. .. code-block:: python from coco_pipe.decoding.configs import TuningConfig, CVConfig tuning = TuningConfig( enabled=True, search_type="grid", # or "random" scoring="accuracy", n_iter=20, # for "random" search only n_jobs=1, refit=True, cv=CVConfig(strategy="stratified", n_splits=3), # inner CV allow_nongroup_inner_cv=False, # leakage guard random_state=42, ) --- 6. ``FeatureSelectionConfig`` ----------------------------- .. code-block:: python from coco_pipe.decoding.configs import FeatureSelectionConfig, CVConfig fs = FeatureSelectionConfig( enabled=True, method="k_best", # or "sfs" n_features=20, scoring="accuracy", # scoring criterion for SFS inner CV cv=CVConfig(strategy="stratified", n_splits=3), # SFS inner CV direction="forward", # for SFS: "forward" or "backward" allow_nongroup_inner_cv=False, ) --- 7. ``CalibrationConfig`` ------------------------ Enables probability calibration inside the training path. .. code-block:: python from coco_pipe.decoding.configs import CalibrationConfig, CVConfig calibration = CalibrationConfig( enabled=True, method="sigmoid", # or "isotonic" cv=CVConfig(strategy="stratified", n_splits=3), allow_nongroup_inner_cv=False, ) --- 8. ``StatisticalAssessmentConfig`` ---------------------------------- .. code-block:: python from coco_pipe.decoding.configs import ( StatisticalAssessmentConfig, ChanceAssessmentConfig, ConfidenceIntervalConfig ) assessment = StatisticalAssessmentConfig( # pass as statistical_assessment=assessment enabled=True, random_state=42, unit_of_inference="group_mean", # "sample", "group_mean", "group_majority", "custom" chance=ChanceAssessmentConfig( method="permutation", # or "binomial", "auto" n_permutations=1000, p0=None, # required for "binomial" temporal_correction="max_stat", # "max_stat", "fdr_bh", "none" store_null_distribution=False, ), confidence_intervals=ConfidenceIntervalConfig( alpha=0.05, method="clopper_pearson", # or "wilson" ), ) --- 9. Foundation Model Configs --------------------------- .. code-block:: python from coco_pipe.decoding.configs import ( FoundationEmbeddingModelConfig, FrozenBackboneDecoderConfig, NeuralFineTuneConfig, LoRAConfig, QuantizationConfig, DeviceConfig, CheckpointConfig, ) # Frozen embedding extractor embed_cfg = FoundationEmbeddingModelConfig( backend="braindecode", # "auto" (default), "braindecode", "hugging_face" model_key="labram", # a registered model — see list_foundation_models() pooling="mean", # "mean" or "flatten" cache_embeddings=True, normalize_embeddings=True, ) # Full / parameter-efficient neural fine-tuning ft_cfg = NeuralFineTuneConfig( backend="hugging_face", model_key="reve", input_kind="epoched", # "temporal", "epoched", "tokens" train_mode="qlora", # "full", "frozen", "linear_probe", "lora", "qlora" lora=LoRAConfig(r=16, alpha=32), quantization=QuantizationConfig(enabled=True, load_in_4bit=True), device=DeviceConfig(device="auto", precision="bf16"), # "fp32", "fp16", "bf16" checkpoints=CheckpointConfig(save="best"), # "none", "best", "last", "all" ) Discover available backbones and their capabilities with :func:`~coco_pipe.decoding.list_foundation_models` and :func:`~coco_pipe.decoding.get_foundation_model_spec`. .. _decoding-result: ``ExperimentResult`` API ======================== ``ExperimentResult`` is the structured container returned by ``Experiment.run()``. It provides 20+ accessor methods for tidy-data inspection, diagnostic reporting, and statistical inference — all without rerunning the experiment. --- 1. Structure ------------ .. code-block:: python result.raw # per-model dict of fold outputs result.meta # environment provenance, task, model names, capabilities result.config # original ExperimentConfig --- 2. Prediction Accessors ----------------------- .. code-block:: python # All out-of-fold predictions in tidy long form preds = result.get_predictions() # columns: Model, Fold, SampleIndex, SampleID, Group, y_true, y_pred # + y_proba_0, y_proba_1, ... (if probabilities available) # + Subject, Session, Site (from sample_metadata) # + Time (sliding) or TrainTime, TestTime (generalizing) --- 3. Score Accessors ------------------ .. code-block:: python # Per-fold, per-metric scores scores = result.get_detailed_scores() # columns: Model, Fold, Metric, Value, Time (if temporal) # Fold-level split information splits = result.get_splits(with_metadata=True) # Fit/predict/score timing and convergence warnings fit_diag = result.get_fit_diagnostics() --- 4. Curve Diagnostics -------------------- .. code-block:: python # ROC curves (binary or one-vs-rest multiclass) roc = result.get_roc_curve() # columns: Model, Fold, Class, FPR, TPR, Threshold, AUC # Precision-recall curves pr = result.get_pr_curve() # columns: Model, Fold, Class, Precision, Recall, Threshold # Calibration (reliability) curves cal = result.get_calibration_curve() # Probability quality summary (log-loss + Brier per fold) prob_diag = result.get_probability_diagnostics() # Summary statistics for ROC AUC roc_summary = result.get_roc_auc_summary() # Summary statistics for PR AUC pr_summary = result.get_pr_auc_summary() --- 5. Confusion Matrices --------------------- .. code-block:: python # Per-fold confusion matrices in long form cm = result.get_confusion_matrices(normalize=True) # columns: Model, Fold, TrueLabel, PredLabel, Count # Pooled (over folds) confusion matrix pooled_cm = result.get_pooled_confusion_matrix(normalize="true") --- 6. Temporal Accessors --------------------- .. code-block:: python # Score summary per timepoint (sliding only) temporal = result.get_temporal_score_summary() # columns: Model, Metric, Time, MeanScore, StdScore # Generalization matrix: shape (n_train_times, n_test_times) matrix = result.get_generalization_matrix("accuracy") # or long form: matrix_long = result.get_generalization_matrix("accuracy", long=True) --- 7. Statistical Inference ------------------------ .. code-block:: python # Full-pipeline or lightweight permutation/binomial assessment assessment = result.get_statistical_assessment() # Lightweight (fixed-prediction, fast, biased) assessment_fast = result.get_statistical_assessment(lightweight=True, metric="accuracy") # Bootstrap CI over independent units ci = result.get_bootstrap_confidence_intervals( metric="accuracy", unit="Subject", n_bootstraps=2000, ci=0.95, ) # Null distribution (if stored via store_null_distribution=True) nulls = result.get_statistical_nulls() --- 8. Model Comparison ------------------- .. code-block:: python # Paired permutation test between two models (in-result) paired = result.compare_models_paired("lr", "svm", metric="accuracy", unit="Subject") # All pairwise comparisons with correction all_pairs = result.compare_models(metric="accuracy", correction="fdr_bh") --- 9. Feature Importances ---------------------- .. code-block:: python # Mean ± std feature importance across folds importances = result.get_feature_importances() # columns: FeatureName, MeanImportance, StdImportance # Per-fold importances fold_imp = result.get_feature_importances(fold_level=True) # Ranked importances (descending by mean) ranked = result.get_feature_importances(rank=True) --- 10. Feature Selection Accessors ------------------------------- .. code-block:: python # Selected features per fold selected = result.get_selected_features(ordered=True) # Feature stability: selection rate across folds stability = result.get_feature_stability() # Per-fold univariate feature scores (k_best only) scores = result.get_feature_scores(with_pvalues=True) --- 11. Hyperparameter Tuning ------------------------- .. code-block:: python # Best hyperparameters per fold best = result.get_best_params() # Full grid search results grid = result.get_search_results() --- 12. Model Artifact Metadata --------------------------- .. code-block:: python # Neural model training history, checkpoints, etc. artifacts = result.get_model_artifacts() --- 13. Serialization ----------------- .. code-block:: python # Serialize to JSON-compatible payload payload = result.to_payload() # Save to file path = result.save("results/my_result.json") # Load from file from coco_pipe.decoding.result import ExperimentResult loaded = ExperimentResult.load("results/my_result.json") .. _decoding-metrics: Metric Registry =============== All metrics are registered in ``coco_pipe.decoding._metrics.METRIC_REGISTRY``. Metric/task compatibility is enforced at config validation time — before any model is trained — preventing silent misuse of classification metrics for regression tasks (or vice versa). --- 1. Registry API --------------- .. code-block:: python from coco_pipe.decoding._metrics import ( get_metric_spec, get_metric_names, get_metric_families, get_scorer, METRIC_REGISTRY, ) # Inspect a single metric spec = get_metric_spec("accuracy") print(spec.name) # "accuracy" print(spec.task) # "classification" print(spec.family) # "label" print(spec.response_method) # "predict" print(spec.greater_is_better) # True # List all classification metrics in the "threshold_sweep" family names = get_metric_names(task="classification", family="threshold_sweep") # Get a callable scorer scorer = get_scorer("f1") # sklearn-compatible callable Each ``MetricSpec`` contains: .. list-table:: :header-rows: 1 :widths: 20 15 65 * - Field - Type - Description * - ``name`` - ``str`` - Unique key in the registry. * - ``task`` - ``str`` - ``"classification"`` or ``"regression"``. * - ``scorer`` - ``Callable`` - ``(y_true, y_pred) → float``. * - ``response_method`` - ``str`` - ``"predict"`` | ``"proba"`` | ``"score"`` | ``"proba_or_score"``. * - ``family`` - ``str`` - Grouping for reporting (see below). * - ``greater_is_better`` - ``bool`` - Directionality for permutation p-values and Max-Stat correction. --- 2. Classification Metrics ------------------------- 2.1 Label Metrics (``family="label"``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Require only ``predict`` output. Work with any classifier. .. list-table:: :header-rows: 1 :widths: 30 70 * - Metric - Description * - ``accuracy`` - Fraction of correctly classified samples. Sensitive to class imbalance. * - ``balanced_accuracy`` - Mean recall per class. Recommended over ``accuracy`` for imbalanced data. * - ``zero_one_loss`` - Fraction misclassified. ``1 - accuracy``. ``greater_is_better=False``. * - ``hamming_loss`` - Per-label Hamming loss (fraction of labels incorrectly predicted). 2.2 Confusion-Derived Metrics (``family="confusion"``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Derived from the confusion matrix. Require only ``predict``. .. list-table:: :header-rows: 1 :widths: 30 70 * - Metric - Description * - ``f1`` - Binary F1 score (harmonic mean of precision and recall). * - ``f1_macro`` - Unweighted macro-average F1 across classes. * - ``f1_micro`` - Global precision/recall pooled across classes. * - ``precision`` - Positive predictive value. TP / (TP + FP). * - ``recall`` - Sensitivity / true positive rate. TP / (TP + FN). * - ``sensitivity`` - Synonym for recall. Binary only; raises ``ValueError`` for multiclass. * - ``specificity`` - True negative rate. TN / (TN + FP). Binary only. * - ``jaccard`` - Intersection-over-union for binary labels. * - ``matthews_corrcoef`` - Matthews correlation coefficient. Balanced for all class distributions. * - ``cohen_kappa`` - Agreement corrected for chance. Range [-1, 1]. 2.3 Threshold-Sweep Metrics (``family="threshold_sweep"``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Require probability or decision scores. Use ``predict_proba`` when available, ``decision_function`` as fallback for binary classifiers. .. list-table:: :header-rows: 1 :widths: 30 70 * - Metric - Description * - ``roc_auc`` - Area under the ROC curve (binary OvR). Insensitive to class threshold. * - ``roc_auc_ovr_weighted`` - Macro-weighted one-vs-rest AUC for multiclass. * - ``average_precision`` - Area under the PR curve using sklearn's interpolated AP (binary). * - ``pr_auc`` - Trapezoidal AUC of the precision-recall curve. Preferred over AP when positive fraction is small. 2.4 Probability-Score Metrics (``family="score_probability"``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Require ``predict_proba``. Enable calibration diagnostics. .. list-table:: :header-rows: 1 :widths: 30 70 * - Metric - Description * - ``log_loss`` - Cross-entropy loss. Lower is better (``greater_is_better=False``). * - ``brier_score`` - Mean squared error of probability predictions. Lower is better. --- 3. Regression Metrics (``family="regression"``) ----------------------------------------------- Require only ``predict`` output. .. list-table:: :header-rows: 1 :widths: 30 70 * - Metric - Description * - ``r2`` - Coefficient of determination. 1.0 is perfect fit; can be negative. * - ``neg_mean_squared_error`` - Negative MSE. Negated so higher = better for optimization consistency. * - ``neg_mean_absolute_error`` - Negative MAE. More robust than MSE to outliers. * - ``neg_root_mean_squared_error`` - Negative RMSE. Same units as the target variable. * - ``explained_variance`` - Proportion of variance explained. Similar to R² but not penalized for bias. --- 4. Compatibility Rules ---------------------- The registry enforces three compatibility checks at ``ExperimentConfig`` validation time: 1. **Task mismatch**: A metric's ``task`` must match ``ExperimentConfig.task``. 2. **Proba requirement**: If ``response_method == "proba"``, the model must declare ``predict_proba`` **or** calibration must be enabled. 3. **Score requirement**: If ``response_method == "proba_or_score"``, the model must declare ``predict_proba`` **or** ``decision_function``. These checks fire before any model is trained, producing a clear ``ValueError`` with the specific metric and model name. --- 5. Custom Metrics ----------------- You can extend the registry for project-specific metrics: .. code-block:: python from coco_pipe.decoding._metrics import METRIC_REGISTRY, MetricSpec from sklearn.metrics import top_k_accuracy_score from functools import partial top2 = partial(top_k_accuracy_score, k=2, labels=[0, 1, 2]) METRIC_REGISTRY["top2_accuracy"] = MetricSpec( name="top2_accuracy", task="classification", scorer=top2, response_method="proba", family="label", greater_is_better=True, ) .. warning:: Custom metrics are added to the in-process registry only. They are not persisted in saved ``ExperimentResult`` payloads and must be re-registered in any new Python process that loads existing results. .. _decoding-feature-selection: Feature Selection ================= ``coco_pipe.decoding`` supports two feature selection strategies that execute **inside** each outer CV fold on the training partition only, guaranteeing zero test-set leakage. --- 1. Filter Selection (``k_best``) -------------------------------- ``SelectKBest`` selects the top-``k`` features based on a univariate statistical test. It has no inner CV loop. It is fast and well-suited for high-dimensional data (e.g., many EEG channels/frequency bins) where a quick, interpretable feature ranking is desired. .. code-block:: python from coco_pipe.decoding.configs import ( ExperimentConfig, CVConfig, ClassicalModelConfig, FeatureSelectionConfig ) config = ExperimentConfig( task="classification", models={"lr": ClassicalModelConfig(estimator="LogisticRegression")}, metrics=["accuracy"], cv=CVConfig(strategy="stratified_group_kfold", n_splits=5, group_key="Subject"), feature_selection=FeatureSelectionConfig( enabled=True, method="k_best", n_features=20, scoring="accuracy", # optional; defaults to task-appropriate test ), ) 1.1 Score Function ~~~~~~~~~~~~~~~~~~ For classification, the default univariate test is ``f_classif`` (ANOVA F-value). For regression, it is ``f_regression``. Override via ``feature_selection.scoring``. 1.2 Accessing Feature Scores ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After fitting, retrieve per-fold and per-feature scores: .. code-block:: python feature_scores = result.get_feature_scores() # columns: FeatureName, Fold, Score, PValue # Mean score across folds mean_scores = feature_scores.groupby("FeatureName")["Score"].mean().sort_values(ascending=False) --- 2. Sequential Feature Selection (``sfs``) ----------------------------------------- ``SequentialFeatureSelector`` is a wrapper-based method. It iteratively adds (forward SFS) or removes (backward SFS) features by evaluating the model's cross-validated performance on each candidate feature set. Because it uses the model's predictive performance as the selection criterion, it is more powerful than filter methods but significantly more expensive. .. code-block:: python config = ExperimentConfig( task="classification", models={"lr": ClassicalModelConfig(estimator="LogisticRegression")}, metrics=["balanced_accuracy"], cv=CVConfig(strategy="stratified_group_kfold", n_splits=5, group_key="Subject"), feature_selection=FeatureSelectionConfig( enabled=True, method="sfs", n_features=10, scoring="balanced_accuracy", # criterion for SFS inner evaluation cv=CVConfig(strategy="stratified_group_kfold", n_splits=3, group_key="Subject"), direction="forward", # or "backward" ), ) 2.1 Inner CV for SFS ~~~~~~~~~~~~~~~~~~~~ SFS requires an inner CV loop to evaluate candidate feature sets. When omitted, ``coco_pipe.decoding`` derives the inner SFS CV from: 1. ``tuning.cv`` if tuning is enabled. 2. The outer CV family (group-based if outer is group-based). When the outer CV is group-based, the SFS inner CV is automatically group-based. Overriding requires ``allow_nongroup_inner_cv=True``. 2.2 Group-Aware SFS ~~~~~~~~~~~~~~~~~~~ ``coco_pipe.decoding`` uses scikit-learn metadata routing to pass the outer-fold training groups into the SFS inner CV. This requires ``scikit-learn >= 1.6``. 2.3 SFS with Tuning ~~~~~~~~~~~~~~~~~~~ SFS combined with hyperparameter tuning evaluates feature subsets inside the tuning inner folds. ``coco_pipe.decoding`` uses a ``sklearn.pipeline.Pipeline`` cache to avoid redundant refitting: .. code-block:: python config = ExperimentConfig( ..., feature_selection=FeatureSelectionConfig(enabled=True, method="sfs", n_features=10), tuning=TuningConfig(enabled=True, scoring="accuracy"), grids={"lr": {"C": [0.1, 1.0, 10.0]}}, ) .. warning:: SFS + tuning is computationally intensive. Reduce the outer ``n_splits`` or the SFS inner ``n_splits`` for development runs. --- 3. Feature Stability Analysis ----------------------------- For both ``k_best`` and ``sfs``, ``coco_pipe.decoding`` tracks which features were selected in each fold. The stability score is the proportion of folds in which a feature was selected: .. code-block:: python stability = result.get_feature_stability() # columns: FeatureName, SelectionRate, MeanRank, StdRank # Most stable features top = stability.sort_values("SelectionRate", ascending=False).head(20) .. note:: Feature stability across folds is a measure of **generalizability**, not importance. A feature selected in all folds is a robust signal across the sampled subjects, regardless of its average selection score. --- 4. Selected Features per Fold ----------------------------- .. code-block:: python selected = result.get_selected_features() # columns: FeatureName, Fold, Rank # Features selected in every fold universal = selected.groupby("FeatureName")["Fold"].count() universal = universal[universal == config.cv.n_splits].index.tolist() --- 5. Compatibility Notes ---------------------- - Feature selection is only valid for 2D tabular inputs (``input_kind in {"tabular_2d", "embedding_2d"}``). - Feature selection is **incompatible** with temporal estimators (``SlidingEstimator``, ``GeneralizingEstimator``). The registry blocks this at validation time. - ``k_best`` does not support ranked importances beyond fold scores/p-values. For importance-based selection, use tree ensemble importances via ``result.get_feature_importances()``.