coco_pipe.io.utils#

Miscellaneous IO helpers — BIDS loading, stratified sampling, and table utilities.

This module is intentionally thin: heavy quality logic lives in coco_pipe.io.quality; data-structure definitions live in coco_pipe.io.structures. Everything here is either a small utility (read_table, normalize_subject_value) or a sampling helper (make_strata, sample_indices) with no dependency on the QC pipeline.

Attributes#

Functions#

make_strata(df, covariates[, n_bins, binning])

Create a single stratification label from multiple covariates.

row_quality_score(df[, exclude_cols, count_zero, ...])

Calculate per-row badness from NaN, Inf, and optionally zero counts.

sample_indices(df, target, size_map, rng, replace, ...)

Sample indices for each class based on size_map.

split_column(name, sep, reverse)

Split a column into (unit, feature) using sep and reverse.

read_bids_entry(bids_path, is_pre_epoched, is_evoked, ...)

load_participants_tsv(root)

Reads participants.tsv and returns dict: {sub_id: {col: val, ...}}.

detect_subjects(root)

detect_sessions(root, subject)

detect_runs(root, subject[, session, task, datatype])

Detect available runs for a given subject/session/task.

normalize_subject_value(value)

Normalize a BIDS subject label to a zero-padded 4-digit string.

compute_feature_missingness(df, feature_cols)

Compute per-column missingness and non-finite rates.

compute_constant_feature_summary(df, feature_cols[, tol])

Compute per-column variance and constant-feature indicators.

Module Contents#

coco_pipe.io.utils.logger#
coco_pipe.io.utils.mne = None#
coco_pipe.io.utils.BIDSPath = None#
coco_pipe.io.utils.read_raw_bids = None#
coco_pipe.io.utils.make_strata(df, covariates, n_bins=5, binning='quantile')#

Create a single stratification label from multiple covariates. Numeric covariates are binned.

Parameters:
Return type:

pandas.Series

coco_pipe.io.utils.row_quality_score(df, exclude_cols=None, count_zero=True, normalize=False)#

Calculate per-row badness from NaN, Inf, and optionally zero counts.

Higher values indicate worse quality. With normalize=True, divide by the number of evaluated numeric columns so scores are in [0, 1].

Parameters:
  • df (pandas.DataFrame) – Input rows to score.

  • exclude_cols (list[str] | None) – Columns to exclude before selecting numeric values.

  • count_zero (bool) – Whether zero values contribute to the badness score.

  • normalize (bool) – Whether to divide counts by the number of evaluated numeric columns.

Returns:

Row-aligned badness scores. Lower values indicate better quality.

Return type:

pandas.Series

coco_pipe.io.utils.sample_indices(df, target, size_map, rng, replace, prefer_clean, exclude)#

Sample indices for each class based on size_map.

Parameters:
Return type:

pandas.Index

coco_pipe.io.utils.split_column(name, sep, reverse)#

Split a column into (unit, feature) using sep and reverse.

Parameters:
Return type:

tuple[str, str]

coco_pipe.io.utils.read_bids_entry(bids_path, is_pre_epoched, is_evoked, mode, window_length, stride, event_id=None, tmin=-0.2, tmax=0.5, baseline=None, units=None)#
Parameters:
Return type:

tuple[numpy.ndarray, numpy.ndarray, list[str], float, numpy.ndarray | None]

coco_pipe.io.utils.load_participants_tsv(root)#

Reads participants.tsv and returns dict: {sub_id: {col: val, …}}.

Parameters:

root (pathlib.Path)

Return type:

dict[str, dict[str, Any]]

coco_pipe.io.utils.detect_subjects(root)#
Parameters:

root (pathlib.Path)

Return type:

list[str]

coco_pipe.io.utils.detect_sessions(root, subject)#
Parameters:
Return type:

list[str]

coco_pipe.io.utils.detect_runs(root, subject, session=None, task=None, datatype='eeg')#

Detect available runs for a given subject/session/task.

Parameters:
Return type:

list[str]

coco_pipe.io.utils.normalize_subject_value(value)#

Normalize a BIDS subject label to a zero-padded 4-digit string.

The sub- prefix is stripped when present, while non-numeric labels are returned unchanged.

Parameters:

value (object) – Raw subject label from a metadata table or BIDS path component.

Returns:

Normalized subject string.

Return type:

str

coco_pipe.io.utils.compute_feature_missingness(df, feature_cols)#

Compute per-column missingness and non-finite rates.

NaN values contribute only to the missingness metrics. Positive and negative infinity contribute only to the non-finite metrics.

Parameters:
Return type:

pandas.DataFrame

coco_pipe.io.utils.compute_constant_feature_summary(df, feature_cols, tol=1e-12)#

Compute per-column variance and constant-feature indicators.

Standard deviations use the population definition (ddof=0). Entirely NaN columns are identified separately and are not marked constant.

Parameters:
Return type:

pandas.DataFrame