coco_pipe.io.utils ================== .. py:module:: coco_pipe.io.utils .. autoapi-nested-parse:: Miscellaneous IO helpers — BIDS loading, stratified sampling, and table utilities. This module is intentionally thin: heavy quality logic lives in :mod:`coco_pipe.io.quality`; data-structure definitions live in :mod:`coco_pipe.io.structures`. Everything here is either a small utility (``read_table``, ``normalize_subject_value``) or a sampling helper (``make_strata``, ``sample_indices``) with no dependency on the QC pipeline. Attributes ---------- .. autoapisummary:: coco_pipe.io.utils.logger coco_pipe.io.utils.mne coco_pipe.io.utils.BIDSPath coco_pipe.io.utils.read_raw_bids Functions --------- .. autoapisummary:: coco_pipe.io.utils.make_strata coco_pipe.io.utils.row_quality_score coco_pipe.io.utils.sample_indices coco_pipe.io.utils.split_column coco_pipe.io.utils.read_bids_entry coco_pipe.io.utils.load_participants_tsv coco_pipe.io.utils.detect_subjects coco_pipe.io.utils.detect_sessions coco_pipe.io.utils.detect_runs coco_pipe.io.utils.normalize_subject_value coco_pipe.io.utils.compute_feature_missingness coco_pipe.io.utils.compute_constant_feature_summary Module Contents --------------- .. py:data:: logger .. py:data:: mne :value: None .. py:data:: BIDSPath :value: None .. py:data:: read_raw_bids :value: None .. py:function:: make_strata(df, covariates, n_bins = 5, binning = 'quantile') Create a single stratification label from multiple covariates. Numeric covariates are binned. .. py:function:: row_quality_score(df, exclude_cols = None, count_zero = True, normalize = False) Calculate per-row badness from NaN, Inf, and optionally zero counts. Higher values indicate worse quality. With ``normalize=True``, divide by the number of evaluated numeric columns so scores are in ``[0, 1]``. :param df: Input rows to score. :param exclude_cols: Columns to exclude before selecting numeric values. :param count_zero: Whether zero values contribute to the badness score. :param normalize: Whether to divide counts by the number of evaluated numeric columns. :returns: Row-aligned badness scores. Lower values indicate better quality. :rtype: pandas.Series .. py:function:: sample_indices(df, target, size_map, rng, replace, prefer_clean, exclude) Sample indices for each class based on size_map. .. py:function:: split_column(name, sep, reverse) Split a column into (unit, feature) using `sep` and `reverse`. .. py:function:: read_bids_entry(bids_path, is_pre_epoched, is_evoked, mode, window_length, stride, event_id = None, tmin = -0.2, tmax = 0.5, baseline = None, units = None) .. py:function:: load_participants_tsv(root) Reads participants.tsv and returns dict: {sub_id: {col: val, ...}}. .. py:function:: detect_subjects(root) .. py:function:: detect_sessions(root, subject) .. py:function:: detect_runs(root, subject, session = None, task = None, datatype = 'eeg') Detect available runs for a given subject/session/task. .. py:function:: normalize_subject_value(value) Normalize a BIDS subject label to a zero-padded 4-digit string. The ``sub-`` prefix is stripped when present, while non-numeric labels are returned unchanged. :param value: Raw subject label from a metadata table or BIDS path component. :type value: object :returns: Normalized subject string. :rtype: str .. py:function:: compute_feature_missingness(df, feature_cols) Compute per-column missingness and non-finite rates. NaN values contribute only to the missingness metrics. Positive and negative infinity contribute only to the non-finite metrics. .. py:function:: compute_constant_feature_summary(df, feature_cols, tol = 1e-12) Compute per-column variance and constant-feature indicators. Standard deviations use the population definition (``ddof=0``). Entirely NaN columns are identified separately and are not marked constant.