coco_pipe.io.utils#
Miscellaneous IO helpers — BIDS loading, stratified sampling, and table utilities.
This module is intentionally thin: heavy quality logic lives in
coco_pipe.io.quality; data-structure definitions live in
coco_pipe.io.structures. Everything here is either a small utility
(read_table, normalize_subject_value) or a sampling helper
(make_strata, sample_indices) with no dependency on the QC pipeline.
Attributes#
Functions#
|
Create a single stratification label from multiple covariates. |
|
Calculate per-row badness from NaN, Inf, and optionally zero counts. |
|
Sample indices for each class based on size_map. |
|
Split a column into (unit, feature) using sep and reverse. |
|
|
|
Reads participants.tsv and returns dict: {sub_id: {col: val, ...}}. |
|
|
|
|
|
Detect available runs for a given subject/session/task. |
|
Normalize a BIDS subject label to a zero-padded 4-digit string. |
|
Compute per-column missingness and non-finite rates. |
|
Compute per-column variance and constant-feature indicators. |
Module Contents#
- coco_pipe.io.utils.logger#
- coco_pipe.io.utils.mne = None#
- coco_pipe.io.utils.BIDSPath = None#
- coco_pipe.io.utils.read_raw_bids = None#
- coco_pipe.io.utils.make_strata(df, covariates, n_bins=5, binning='quantile')#
Create a single stratification label from multiple covariates. Numeric covariates are binned.
- Parameters:
df (pandas.DataFrame)
n_bins (int)
binning (str)
- Return type:
- coco_pipe.io.utils.row_quality_score(df, exclude_cols=None, count_zero=True, normalize=False)#
Calculate per-row badness from NaN, Inf, and optionally zero counts.
Higher values indicate worse quality. With
normalize=True, divide by the number of evaluated numeric columns so scores are in[0, 1].- Parameters:
df (pandas.DataFrame) – Input rows to score.
exclude_cols (list[str] | None) – Columns to exclude before selecting numeric values.
count_zero (bool) – Whether zero values contribute to the badness score.
normalize (bool) – Whether to divide counts by the number of evaluated numeric columns.
- Returns:
Row-aligned badness scores. Lower values indicate better quality.
- Return type:
- coco_pipe.io.utils.sample_indices(df, target, size_map, rng, replace, prefer_clean, exclude)#
Sample indices for each class based on size_map.
- Parameters:
- Return type:
- coco_pipe.io.utils.split_column(name, sep, reverse)#
Split a column into (unit, feature) using sep and reverse.
- coco_pipe.io.utils.read_bids_entry(bids_path, is_pre_epoched, is_evoked, mode, window_length, stride, event_id=None, tmin=-0.2, tmax=0.5, baseline=None, units=None)#
- Parameters:
- Return type:
tuple[numpy.ndarray, numpy.ndarray, list[str], float, numpy.ndarray | None]
- coco_pipe.io.utils.load_participants_tsv(root)#
Reads participants.tsv and returns dict: {sub_id: {col: val, …}}.
- Parameters:
root (pathlib.Path)
- Return type:
- coco_pipe.io.utils.detect_subjects(root)#
- Parameters:
root (pathlib.Path)
- Return type:
- coco_pipe.io.utils.detect_sessions(root, subject)#
- Parameters:
root (pathlib.Path)
subject (str)
- Return type:
- coco_pipe.io.utils.detect_runs(root, subject, session=None, task=None, datatype='eeg')#
Detect available runs for a given subject/session/task.
- coco_pipe.io.utils.normalize_subject_value(value)#
Normalize a BIDS subject label to a zero-padded 4-digit string.
The
sub-prefix is stripped when present, while non-numeric labels are returned unchanged.
- coco_pipe.io.utils.compute_feature_missingness(df, feature_cols)#
Compute per-column missingness and non-finite rates.
NaN values contribute only to the missingness metrics. Positive and negative infinity contribute only to the non-finite metrics.
- Parameters:
df (pandas.DataFrame)
- Return type:
- coco_pipe.io.utils.compute_constant_feature_summary(df, feature_cols, tol=1e-12)#
Compute per-column variance and constant-feature indicators.
Standard deviations use the population definition (
ddof=0). Entirely NaN columns are identified separately and are not marked constant.- Parameters:
df (pandas.DataFrame)
tol (float)
- Return type: