coco_pipe.descriptors.tables#

Assemble descriptor extraction containers into epoch/subject feature tables.

The descriptor lifecycle is extract → reject → aggregate → merge:

Project-specific concerns (BIDS grouping-key derivation, channel-group pooling, shard file layout, QC reports) stay with the caller.

Author: Hamza Abdelhedi <hamza.abdelhedi@umontreal.ca>

Functions#

mad_failures_from_qc(qc_result)

Turn epochs dropped by drop_epoch_outliers() into failure records.

add_aggregated_band_ratios(base_features_df, ratio_pairs)

Compute band-ratio columns from aggregated mean band-power features.

build_descriptor_tables(container, metadata_df, group_by)

Build epoch- and group-aggregated subject-level descriptor tables.

Module Contents#

coco_pipe.descriptors.tables.mad_failures_from_qc(qc_result)#

Turn epochs dropped by drop_epoch_outliers() into failure records.

Mirrors the extractor failure schema (obs_id, obs_index, channel_index, channel_name, family, exception_type, message) so MAD drops flow into the same failure log as extraction failures. When rejection was made per descriptor group, one record is emitted per (group, epoch); otherwise one per dropped epoch.

Parameters:

qc_result (coco_pipe.io.quality.QCResult | None) – The QCResult returned by coco_pipe.io.quality.drop_epoch_outliers() (None yields []).

Return type:

list[dict[str, Any]]

coco_pipe.descriptors.tables.add_aggregated_band_ratios(base_features_df, ratio_pairs, floor=0.0, prefixes=DEFAULT_RATIO_PREFIXES)#

Compute band-ratio columns from aggregated mean band-power features.

For each (numerator, denominator) band pair and each (input_prefix, output_prefix) in prefixes, divide every matching {input_prefix}{numerator}_{suffix} column by its {input_prefix}{denominator}_{suffix} counterpart. Denominators at or below floor yield NaN instead of an unstable division.

Returns:

One column per emitted ratio (empty when nothing matched). Aligned to base_features_df’s row index.

Return type:

pandas.DataFrame

Parameters:
coco_pipe.descriptors.tables.build_descriptor_tables(container, metadata_df, group_by, id_col='obs_id', target_col=None, aggregation_groups=None, ratio_pairs=None, ratio_floor=0.0, ratio_prefixes=DEFAULT_RATIO_PREFIXES, min_count=1, on_insufficient='raise')#

Build epoch- and group-aggregated subject-level descriptor tables.

Parameters:
  • container (coco_pipe.io.structures.DataContainer) – Flat ("obs", "feature") descriptor container from extract() (typically after epoch MAD rejection).

  • metadata_df (pandas.DataFrame) – One row per epoch, aligned with container.X. Must contain id_col; every other column is carried as an observation coordinate and, when constant within a group, into the subject table.

  • group_by (str) – Metadata column defining the aggregation groups (e.g. a recording id).

  • id_col (str) – Observation-id column in metadata_df (default "obs_id").

  • target_col (str | None) – Optional target column carried onto the subject table.

  • aggregation_groups (collections.abc.Sequence[collections.abc.Mapping[str, Any]] | None) – aggregate_groups specs producing the subject feature columns (each {"stats": ..., <selectors>}). Defaults to [{"stats": "mean"}] (mean of every feature). This is where median / IQR / etc. subject-level stats are requested.

  • ratio_pairs (collections.abc.Sequence[tuple[str, str]] | None) – When ratio_pairs is given, band ratios from the aggregated mean features are appended via add_aggregated_band_ratios().

  • ratio_floor (float) – When ratio_pairs is given, band ratios from the aggregated mean features are appended via add_aggregated_band_ratios().

  • ratio_prefixes (collections.abc.Sequence[tuple[str, str]]) – When ratio_pairs is given, band ratios from the aggregated mean features are appended via add_aggregated_band_ratios().

  • min_count (int) – Forwarded to aggregate() and aggregate_groups. With on_insufficient="warn" a group (or a single descriptor family within aggregate_groups) whose surviving rows are all-NaN emits NaN features instead of raising, so the subject is retained with whatever else is computable.

  • on_insufficient (str) – Forwarded to aggregate() and aggregate_groups. With on_insufficient="warn" a group (or a single descriptor family within aggregate_groups) whose surviving rows are all-NaN emits NaN features instead of raising, so the subject is retained with whatever else is computable.

Returns:

epoch_df, subject_df, epoch_feature_columns, and subject_feature_columns.

Return type:

dict

Raises:

ValueError – If id_col or group_by is missing from metadata_df.