coco_pipe.descriptors.tables#

Assemble descriptor extraction containers into epoch/subject feature tables.

The descriptor lifecycle is extract → reject → aggregate → merge:

extract — coco_pipe.descriptors.core.DescriptorPipeline.extract() returns a flat ("obs", "feature") DataContainer.
reject — epoch MAD-outlier rejection is just coco_pipe.io.quality.drop_epoch_outliers() on that container; mad_failures_from_qc() turns the dropped epochs into failure records.
aggregate — build_descriptor_tables() builds the per-epoch table and the group-aggregated subject table (mean + extra grouped stats + optional band ratios) via aggregate() / aggregate_groups().
merge — cross-shard concatenation lives in coco_pipe.descriptors.io.merge_descriptor_tables().

Project-specific concerns (BIDS grouping-key derivation, channel-group pooling, shard file layout, QC reports) stay with the caller.

Author: Hamza Abdelhedi <hamza.abdelhedi@umontreal.ca>

Functions#

`mad_failures_from_qc`(qc_result)	Turn epochs dropped by `drop_epoch_outliers()` into failure records.
`add_aggregated_band_ratios`(base_features_df, ratio_pairs)	Compute band-ratio columns from aggregated mean band-power features.
`build_descriptor_tables`(container, metadata_df, group_by)	Build epoch- and group-aggregated subject-level descriptor tables.

Module Contents#

coco_pipe.descriptors.tables.mad_failures_from_qc(qc_result)#

Turn epochs dropped by drop_epoch_outliers() into failure records.

Mirrors the extractor failure schema (obs_id, obs_index, channel_index, channel_name, family, exception_type, message) so MAD drops flow into the same failure log as extraction failures. When rejection was made per descriptor group, one record is emitted per (group, epoch); otherwise one per dropped epoch.

Parameters:: qc_result (coco_pipe.io.quality.QCResult | None) – The QCResult returned by coco_pipe.io.quality.drop_epoch_outliers() (None yields []).
Return type:: list[dict[str, Any]]

coco_pipe.descriptors.tables.add_aggregated_band_ratios(base_features_df, ratio_pairs, floor=0.0, prefixes=DEFAULT_RATIO_PREFIXES)#

Compute band-ratio columns from aggregated mean band-power features.

For each (numerator, denominator) band pair and each (input_prefix, output_prefix) in prefixes, divide every matching {input_prefix}{numerator}_{suffix} column by its {input_prefix}{denominator}_{suffix} counterpart. Denominators at or below floor yield NaN instead of an unstable division.

Returns:

One column per emitted ratio (empty when nothing matched). Aligned to base_features_df’s row index.

Return type:

pandas.DataFrame

Parameters:

base_features_df (pandas.DataFrame)
ratio_pairs (collections.abc.Sequence[tuple[str, str]])
floor (float)
prefixes (collections.abc.Sequence[tuple[str, str]])

coco_pipe.descriptors.tables.build_descriptor_tables(container, metadata_df, group_by, id_col='obs_id', target_col=None, aggregation_groups=None, ratio_pairs=None, ratio_floor=0.0, ratio_prefixes=DEFAULT_RATIO_PREFIXES, min_count=1, on_insufficient='raise')#

Build epoch- and group-aggregated subject-level descriptor tables.

Parameters:

container (coco_pipe.io.structures.DataContainer) – Flat ("obs", "feature") descriptor container from extract() (typically after epoch MAD rejection).
metadata_df (pandas.DataFrame) – One row per epoch, aligned with container.X. Must contain id_col; every other column is carried as an observation coordinate and, when constant within a group, into the subject table.
group_by (str) – Metadata column defining the aggregation groups (e.g. a recording id).
id_col (str) – Observation-id column in metadata_df (default "obs_id").
target_col (str | None) – Optional target column carried onto the subject table.
aggregation_groups (collections.abc.Sequence[collections.abc.Mapping[str, Any]] | None) – aggregate_groups specs producing the subject feature columns (each {"stats": ..., <selectors>}). Defaults to [{"stats": "mean"}] (mean of every feature). This is where median / IQR / etc. subject-level stats are requested.
ratio_pairs (collections.abc.Sequence[tuple[str, str]] | None) – When ratio_pairs is given, band ratios from the aggregated mean features are appended via add_aggregated_band_ratios().
ratio_floor (float) – When ratio_pairs is given, band ratios from the aggregated mean features are appended via add_aggregated_band_ratios().
ratio_prefixes (collections.abc.Sequence[tuple[str, str]]) – When ratio_pairs is given, band ratios from the aggregated mean features are appended via add_aggregated_band_ratios().
min_count (int) – Forwarded to aggregate() and aggregate_groups. With on_insufficient="warn" a group (or a single descriptor family within aggregate_groups) whose surviving rows are all-NaN emits NaN features instead of raising, so the subject is retained with whatever else is computable.
on_insufficient (str) – Forwarded to aggregate() and aggregate_groups. With on_insufficient="warn" a group (or a single descriptor family within aggregate_groups) whose surviving rows are all-NaN emits NaN features instead of raising, so the subject is retained with whatever else is computable.

Returns:

epoch_df, subject_df, epoch_feature_columns, and subject_feature_columns.

Return type:

dict

Raises:

ValueError – If id_col or group_by is missing from metadata_df.