coco_pipe.io.utils#

Miscellaneous IO helpers — BIDS loading, stratified sampling, and table utilities.

This module is intentionally thin: heavy quality logic lives in coco_pipe.io.quality; data-structure definitions live in coco_pipe.io.structures. Everything here is either a small utility (read_table, normalize_subject_value) or a sampling helper (make_strata, sample_indices) with no dependency on the QC pipeline.

Attributes#

`logger`
`mne`
`BIDSPath`
`read_raw_bids`

Functions#

`make_strata`(df, covariates[, n_bins, binning])	Create a single stratification label from multiple covariates.
`row_quality_score`(df[, exclude_cols, count_zero, ...])	Calculate per-row badness from NaN, Inf, and optionally zero counts.
`sample_indices`(df, target, size_map, rng, replace, ...)	Sample indices for each class based on size_map.
`split_column`(name, sep, reverse)	Split a column into (unit, feature) using sep and reverse.
`read_bids_entry`(bids_path, is_pre_epoched, is_evoked, ...)
`load_participants_tsv`(root)	Reads participants.tsv and returns dict: {sub_id: {col: val, ...}}.
`detect_subjects`(root)
`detect_sessions`(root, subject)
`detect_runs`(root, subject[, session, task, datatype])	Detect available runs for a given subject/session/task.
`normalize_subject_value`(value)	Normalize a BIDS subject label to a zero-padded 4-digit string.
`compute_feature_missingness`(df, feature_cols)	Compute per-column missingness and non-finite rates.
`compute_constant_feature_summary`(df, feature_cols[, tol])	Compute per-column variance and constant-feature indicators.

Module Contents#

coco_pipe.io.utils.logger#

coco_pipe.io.utils.mne = None#

coco_pipe.io.utils.BIDSPath = None#

coco_pipe.io.utils.read_raw_bids = None#

coco_pipe.io.utils.make_strata(df, covariates, n_bins=5, binning='quantile')#

Create a single stratification label from multiple covariates. Numeric covariates are binned.

Parameters:

df (pandas.DataFrame)
covariates (list[str])
n_bins (int)
binning (str)

Return type:

pandas.Series

coco_pipe.io.utils.row_quality_score(df, exclude_cols=None, count_zero=True, normalize=False)#

Calculate per-row badness from NaN, Inf, and optionally zero counts.

Higher values indicate worse quality. With normalize=True, divide by the number of evaluated numeric columns so scores are in [0, 1].

Parameters:

df (pandas.DataFrame) – Input rows to score.
exclude_cols (list[str] | None) – Columns to exclude before selecting numeric values.
count_zero (bool) – Whether zero values contribute to the badness score.
normalize (bool) – Whether to divide counts by the number of evaluated numeric columns.

Returns:

Row-aligned badness scores. Lower values indicate better quality.

Return type:

pandas.Series

coco_pipe.io.utils.sample_indices(df, target, size_map, rng, replace, prefer_clean, exclude)#

Sample indices for each class based on size_map.

Parameters:

df (pandas.DataFrame)
target (str)
size_map (dict[Any, int])
replace (bool)
prefer_clean (bool)
exclude (list[str])

Return type:

pandas.Index

coco_pipe.io.utils.split_column(name, sep, reverse)#

Split a column into (unit, feature) using sep and reverse.

Parameters:

name (str)
sep (str)
reverse (bool)

Return type:

tuple[str, str]

coco_pipe.io.utils.read_bids_entry(bids_path, is_pre_epoched, is_evoked, mode, window_length, stride, event_id=None, tmin=-0.2, tmax=0.5, baseline=None, units=None)#

Parameters:

bids_path (Any)
is_pre_epoched (bool)
is_evoked (bool)
mode (str)
window_length (float | None)
stride (float | None)
event_id (dict[str, int] | str | list[str] | None)
tmin (float)
tmax (float)
baseline (tuple[float | None, float | None] | None)
units (str | None)

Return type:

tuple[numpy.ndarray, numpy.ndarray, list[str], float, numpy.ndarray | None]

coco_pipe.io.utils.load_participants_tsv(root)#

Reads participants.tsv and returns dict: {sub_id: {col: val, …}}.

Parameters:: root (pathlib.Path)
Return type:: dict[str, dict[str, Any]]

coco_pipe.io.utils.detect_subjects(root)#

Parameters:: root (pathlib.Path)
Return type:: list[str]

coco_pipe.io.utils.detect_sessions(root, subject)#

Parameters:

root (pathlib.Path)
subject (str)

Return type:

list[str]

coco_pipe.io.utils.detect_runs(root, subject, session=None, task=None, datatype='eeg')#

Detect available runs for a given subject/session/task.

Parameters:

root (pathlib.Path)
subject (str)
session (str | None)
task (str | None)
datatype (str)

Return type:

list[str]

coco_pipe.io.utils.normalize_subject_value(value)#

Normalize a BIDS subject label to a zero-padded 4-digit string.

The sub- prefix is stripped when present, while non-numeric labels are returned unchanged.

Parameters:: value (object) – Raw subject label from a metadata table or BIDS path component.
Returns:: Normalized subject string.
Return type:: str

coco_pipe.io.utils.compute_feature_missingness(df, feature_cols)#

Compute per-column missingness and non-finite rates.

NaN values contribute only to the missingness metrics. Positive and negative infinity contribute only to the non-finite metrics.

Parameters:

df (pandas.DataFrame)
feature_cols (list[str])

Return type:

pandas.DataFrame

coco_pipe.io.utils.compute_constant_feature_summary(df, feature_cols, tol=1e-12)#

Compute per-column variance and constant-feature indicators.

Standard deviations use the population definition (ddof=0). Entirely NaN columns are identified separately and are not marked constant.

Parameters:

df (pandas.DataFrame)
feature_cols (list[str])
tol (float)

Return type:

pandas.DataFrame