coco_pipe.descriptors.qc#

Family-aware QC for descriptor outputs.

Provides column classification and family-level quality aggregation on top of the generic per-column helpers in coco_pipe.io.quality.

Functions#

descriptor_identity(measure)

Return a measure's descriptor identity (aggregation-stat prefix removed).

descriptor_subfamily(family, measure)

Map a (family, measure) pair to its descriptor sub-family.

classify_descriptor_columns(descriptor_names[, ...])

Classify descriptor names into family, measure, channel, and scope.

compute_family_missingness(df, descriptor_names[, ...])

Enrich per-column missingness with descriptor family metadata.

compute_family_constant_summary(df, descriptor_names)

Enrich per-column constant-feature results with family metadata.

select_viable_feature_columns(feature_df, ...[, ...])

Select descriptor columns that pass missingness and degeneracy gates.

summarize_failures(failure_df)

Summarize an extraction failure log by family, channel, and exception.

add_family_diagnostics(family_summary_df, ...)

Add family-specific sanity diagnostics to a family-QC summary.

aggregate_family_qc(df, descriptor_names[, ...])

Aggregate descriptor health indicators to one row per family.

Module Contents#

coco_pipe.descriptors.qc.descriptor_identity(measure)#

Return a measure’s descriptor identity (aggregation-stat prefix removed).

Collapses the per-stat columns of one descriptor — e.g. mean_log_abs_alpha and iqr_log_abs_alpha both map to log_abs_alpha — so location and spread stay together as one unit.

Parameters:

measure (str)

Return type:

str

coco_pipe.descriptors.qc.descriptor_subfamily(family, measure)#

Map a (family, measure) pair to its descriptor sub-family.

A sub-family is the output type within a family — finer than family but coarser than measure:

  • bandlog_abs / rel / corr_log_abs / corr_rel / abs / corr_abs / ratio / corr_ratio (band name stripped)

  • paramaperiodic / peaks / fit_quality

  • complexityentropy / fractal_complexity / signal_dynamics

Robust to subject-level aggregation-stat prefixes (median_…). Unknown families/measures fall back to "<family>_other" (or "unknown").

Parameters:
  • family (str | None)

  • measure (str)

Return type:

str

coco_pipe.descriptors.qc.classify_descriptor_columns(descriptor_names, known_families=KNOWN_FAMILY_TOKENS, *, feature_schema=None)#

Classify descriptor names into family, measure, channel, and scope.

The last _ch- or _chgrp- marker is interpreted as the scope, so earlier channel markers remain part of cross-channel measure names. Unknown family prefixes are retained with family=None. Each call returns a fresh, independently mutable DataFrame so caller changes cannot corrupt the cached canonical result.

Parameters:
Return type:

pandas.DataFrame

coco_pipe.descriptors.qc.compute_family_missingness(df, descriptor_names, known_families=KNOWN_FAMILY_TOKENS, *, feature_schema=None)#

Enrich per-column missingness with descriptor family metadata.

Parameters:
Return type:

pandas.DataFrame

coco_pipe.descriptors.qc.compute_family_constant_summary(df, descriptor_names, tol=1e-12, known_families=KNOWN_FAMILY_TOKENS, *, feature_schema=None)#

Enrich per-column constant-feature results with family metadata.

Parameters:
Return type:

pandas.DataFrame

coco_pipe.descriptors.qc.select_viable_feature_columns(feature_df, descriptor_names, *, max_missing_rate=0.2, drop_all_nan=True, drop_constant=True, constant_tol=1e-12, max_row_drop_rate=None, known_families=KNOWN_FAMILY_TOKENS)#

Select descriptor columns that pass missingness and degeneracy gates.

After the missingness/constant gates, if max_row_drop_rate is set the surviving columns are further pruned (worst-NaN first) so the rows that the caller’s any-NaN purge would still drop stay within rate * n_rows — i.e. prefer shedding a few NaN columns over losing observations.

Parameters:
Return type:

tuple[list[str], pandas.DataFrame]

coco_pipe.descriptors.qc.summarize_failures(failure_df)#

Summarize an extraction failure log by family, channel, and exception.

Parameters:

failure_df (pandas.DataFrame) – Failure records as produced by DescriptorPipeline, with optional family, channel_name, exception_type, and condition columns.

Returns:

by_family, by_channel, by_exception_type, by_condition, by_family_channel (one row per (family, channel) pair), and combined (the four by_* group summaries stacked with a group column identifying their origin).

Return type:

dict of DataFrame

coco_pipe.descriptors.qc.add_family_diagnostics(family_summary_df, feature_missingness_df, feature_df)#

Add family-specific sanity diagnostics to a family-QC summary.

Extends each row of family_summary_df (e.g. from aggregate_family_qc()) with diagnostics specific to the band, param, and complexity descriptor families:

  • band: rate of negative absolute-power values, out-of-range relative power values (outside [0, 1]), and NaN ratio features.

  • param: median/p05 of FOOOF r_squared, median/p95 of fit_error, and missingness of peak-related measures.

  • complexity: median/max missingness across complexity measures and the non-finite rate.

Parameters:
Returns:

A copy of family_summary_df with additional family-specific columns.

Return type:

pd.DataFrame

coco_pipe.descriptors.qc.aggregate_family_qc(df, descriptor_names, failures_df=None, known_families=KNOWN_FAMILY_TOKENS, tol=1e-12, feature_schema=None)#

Aggregate descriptor health indicators to one row per family.

Parameters:
Return type:

pandas.DataFrame