coco_pipe.descriptors.qc#
Family-aware QC for descriptor outputs.
Provides column classification and family-level quality aggregation on top of
the generic per-column helpers in coco_pipe.io.quality.
Functions#
|
Return a measure's descriptor identity (aggregation-stat prefix removed). |
|
Map a |
|
Classify descriptor names into family, measure, channel, and scope. |
|
Enrich per-column missingness with descriptor family metadata. |
|
Enrich per-column constant-feature results with family metadata. |
|
Select descriptor columns that pass missingness and degeneracy gates. |
|
Summarize an extraction failure log by family, channel, and exception. |
|
Add family-specific sanity diagnostics to a family-QC summary. |
|
Aggregate descriptor health indicators to one row per family. |
Module Contents#
- coco_pipe.descriptors.qc.descriptor_identity(measure)#
Return a measure’s descriptor identity (aggregation-stat prefix removed).
Collapses the per-stat columns of one descriptor — e.g.
mean_log_abs_alphaandiqr_log_abs_alphaboth map tolog_abs_alpha— so location and spread stay together as one unit.
- coco_pipe.descriptors.qc.descriptor_subfamily(family, measure)#
Map a
(family, measure)pair to its descriptor sub-family.A sub-family is the output type within a family — finer than
familybut coarser thanmeasure:band →
log_abs/rel/corr_log_abs/corr_rel/abs/corr_abs/ratio/corr_ratio(band name stripped)param →
aperiodic/peaks/fit_qualitycomplexity →
entropy/fractal_complexity/signal_dynamics
Robust to subject-level aggregation-stat prefixes (
median_…). Unknown families/measures fall back to"<family>_other"(or"unknown").
- coco_pipe.descriptors.qc.classify_descriptor_columns(descriptor_names, known_families=KNOWN_FAMILY_TOKENS, *, feature_schema=None)#
Classify descriptor names into family, measure, channel, and scope.
The last
_ch-or_chgrp-marker is interpreted as the scope, so earlier channel markers remain part of cross-channel measure names. Unknown family prefixes are retained withfamily=None. Each call returns a fresh, independently mutable DataFrame so caller changes cannot corrupt the cached canonical result.- Parameters:
feature_schema (pandas.DataFrame | None)
- Return type:
- coco_pipe.descriptors.qc.compute_family_missingness(df, descriptor_names, known_families=KNOWN_FAMILY_TOKENS, *, feature_schema=None)#
Enrich per-column missingness with descriptor family metadata.
- Parameters:
df (pandas.DataFrame)
feature_schema (pandas.DataFrame | None)
- Return type:
- coco_pipe.descriptors.qc.compute_family_constant_summary(df, descriptor_names, tol=1e-12, known_families=KNOWN_FAMILY_TOKENS, *, feature_schema=None)#
Enrich per-column constant-feature results with family metadata.
- Parameters:
df (pandas.DataFrame)
tol (float)
feature_schema (pandas.DataFrame | None)
- Return type:
- coco_pipe.descriptors.qc.select_viable_feature_columns(feature_df, descriptor_names, *, max_missing_rate=0.2, drop_all_nan=True, drop_constant=True, constant_tol=1e-12, max_row_drop_rate=None, known_families=KNOWN_FAMILY_TOKENS)#
Select descriptor columns that pass missingness and degeneracy gates.
After the missingness/constant gates, if
max_row_drop_rateis set the surviving columns are further pruned (worst-NaN first) so the rows that the caller’s any-NaN purge would still drop stay withinrate * n_rows— i.e. prefer shedding a few NaN columns over losing observations.
- coco_pipe.descriptors.qc.summarize_failures(failure_df)#
Summarize an extraction failure log by family, channel, and exception.
- Parameters:
failure_df (pandas.DataFrame) – Failure records as produced by
DescriptorPipeline, with optionalfamily,channel_name,exception_type, andconditioncolumns.- Returns:
by_family,by_channel,by_exception_type,by_condition,by_family_channel(one row per (family, channel) pair), andcombined(the fourby_*group summaries stacked with agroupcolumn identifying their origin).- Return type:
dict of DataFrame
- coco_pipe.descriptors.qc.add_family_diagnostics(family_summary_df, feature_missingness_df, feature_df)#
Add family-specific sanity diagnostics to a family-QC summary.
Extends each row of family_summary_df (e.g. from
aggregate_family_qc()) with diagnostics specific to theband,param, andcomplexitydescriptor families:band: rate of negative absolute-power values, out-of-range relative power values (outside[0, 1]), and NaN ratio features.param: median/p05 of FOOOFr_squared, median/p95 offit_error, and missingness of peak-related measures.complexity: median/max missingness across complexity measures and the non-finite rate.
- Parameters:
family_summary_df (pandas.DataFrame) – One row per family, as produced by
aggregate_family_qc().feature_missingness_df (pandas.DataFrame) – Per-column missingness with family metadata, as produced by
compute_family_missingness().feature_df (pandas.DataFrame) – The underlying feature values (epoch- or subject-level) used to compute value-based diagnostics.
- Returns:
A copy of family_summary_df with additional family-specific columns.
- Return type:
pd.DataFrame
- coco_pipe.descriptors.qc.aggregate_family_qc(df, descriptor_names, failures_df=None, known_families=KNOWN_FAMILY_TOKENS, tol=1e-12, feature_schema=None)#
Aggregate descriptor health indicators to one row per family.
- Parameters:
df (pandas.DataFrame)
failures_df (pandas.DataFrame | None)
tol (float)
feature_schema (pandas.DataFrame | None)
- Return type: