coco_pipe.io.utils
==================

.. py:module:: coco_pipe.io.utils

.. autoapi-nested-parse::

   Miscellaneous IO helpers — BIDS loading, stratified sampling, and table utilities.

   This module is intentionally thin: heavy quality logic lives in
   :mod:`coco_pipe.io.quality`; data-structure definitions live in
   :mod:`coco_pipe.io.structures`.  Everything here is either a small utility
   (``read_table``, ``normalize_subject_value``) or a sampling helper
   (``make_strata``, ``sample_indices``) with no dependency on the QC pipeline.


Attributes
----------

.. autoapisummary::

   coco_pipe.io.utils.logger
   coco_pipe.io.utils.mne
   coco_pipe.io.utils.BIDSPath
   coco_pipe.io.utils.read_raw_bids


Functions
---------

.. autoapisummary::

   coco_pipe.io.utils.make_strata
   coco_pipe.io.utils.row_quality_score
   coco_pipe.io.utils.sample_indices
   coco_pipe.io.utils.split_column
   coco_pipe.io.utils.read_bids_entry
   coco_pipe.io.utils.load_participants_tsv
   coco_pipe.io.utils.detect_subjects
   coco_pipe.io.utils.detect_sessions
   coco_pipe.io.utils.detect_runs
   coco_pipe.io.utils.normalize_subject_value
   coco_pipe.io.utils.compute_feature_missingness
   coco_pipe.io.utils.compute_constant_feature_summary


Module Contents
---------------

.. py:data:: logger

.. py:data:: mne
   :value: None


.. py:data:: BIDSPath
   :value: None


.. py:data:: read_raw_bids
   :value: None


.. py:function:: make_strata(df, covariates, n_bins = 5, binning = 'quantile')

   Create a single stratification label from multiple covariates.
   Numeric covariates are binned.


.. py:function:: row_quality_score(df, exclude_cols = None, count_zero = True, normalize = False)

   Calculate per-row badness from NaN, Inf, and optionally zero counts.

   Higher values indicate worse quality. With ``normalize=True``, divide by
   the number of evaluated numeric columns so scores are in ``[0, 1]``.

   :param df: Input rows to score.
   :param exclude_cols: Columns to exclude before selecting numeric values.
   :param count_zero: Whether zero values contribute to the badness score.
   :param normalize: Whether to divide counts by the number of evaluated numeric columns.

   :returns: Row-aligned badness scores. Lower values indicate better quality.
   :rtype: pandas.Series


.. py:function:: sample_indices(df, target, size_map, rng, replace, prefer_clean, exclude)

   Sample indices for each class based on size_map.


.. py:function:: split_column(name, sep, reverse)

   Split a column into (unit, feature) using `sep` and `reverse`.


.. py:function:: read_bids_entry(bids_path, is_pre_epoched, is_evoked, mode, window_length, stride, event_id = None, tmin = -0.2, tmax = 0.5, baseline = None, units = None)

.. py:function:: load_participants_tsv(root)

   Reads participants.tsv and returns dict: {sub_id: {col: val, ...}}.


.. py:function:: detect_subjects(root)

.. py:function:: detect_sessions(root, subject)

.. py:function:: detect_runs(root, subject, session = None, task = None, datatype = 'eeg')

   Detect available runs for a given subject/session/task.


.. py:function:: normalize_subject_value(value)

   Normalize a BIDS subject label to a zero-padded 4-digit string.

   The ``sub-`` prefix is stripped when present, while non-numeric labels are
   returned unchanged.

   :param value: Raw subject label from a metadata table or BIDS path component.
   :type value: object

   :returns: Normalized subject string.
   :rtype: str


.. py:function:: compute_feature_missingness(df, feature_cols)

   Compute per-column missingness and non-finite rates.

   NaN values contribute only to the missingness metrics. Positive and
   negative infinity contribute only to the non-finite metrics.


.. py:function:: compute_constant_feature_summary(df, feature_cols, tol = 1e-12)

   Compute per-column variance and constant-feature indicators.

   Standard deviations use the population definition (``ddof=0``). Entirely
   NaN columns are identified separately and are not marked constant.