coco_pipe.descriptors.qc ======================== .. py:module:: coco_pipe.descriptors.qc .. autoapi-nested-parse:: Family-aware QC for descriptor outputs. Provides column classification and family-level quality aggregation on top of the generic per-column helpers in :mod:`coco_pipe.io.quality`. Functions --------- .. autoapisummary:: coco_pipe.descriptors.qc.descriptor_identity coco_pipe.descriptors.qc.descriptor_subfamily coco_pipe.descriptors.qc.classify_descriptor_columns coco_pipe.descriptors.qc.compute_family_missingness coco_pipe.descriptors.qc.compute_family_constant_summary coco_pipe.descriptors.qc.select_viable_feature_columns coco_pipe.descriptors.qc.summarize_failures coco_pipe.descriptors.qc.add_family_diagnostics coco_pipe.descriptors.qc.aggregate_family_qc Module Contents --------------- .. py:function:: descriptor_identity(measure) Return a measure's descriptor identity (aggregation-stat prefix removed). Collapses the per-stat columns of one descriptor — e.g. ``mean_log_abs_alpha`` and ``iqr_log_abs_alpha`` both map to ``log_abs_alpha`` — so location and spread stay together as one unit. .. py:function:: descriptor_subfamily(family, measure) Map a ``(family, measure)`` pair to its descriptor sub-family. A sub-family is the *output type* within a family — finer than ``family`` but coarser than ``measure``: - **band** → ``log_abs`` / ``rel`` / ``corr_log_abs`` / ``corr_rel`` / ``abs`` / ``corr_abs`` / ``ratio`` / ``corr_ratio`` (band name stripped) - **param** → ``aperiodic`` / ``peaks`` / ``fit_quality`` - **complexity** → ``entropy`` / ``fractal_complexity`` / ``signal_dynamics`` Robust to subject-level aggregation-stat prefixes (``median_…``). Unknown families/measures fall back to ``"_other"`` (or ``"unknown"``). .. py:function:: classify_descriptor_columns(descriptor_names, known_families = KNOWN_FAMILY_TOKENS, *, feature_schema = None) Classify descriptor names into family, measure, channel, and scope. The last ``_ch-`` or ``_chgrp-`` marker is interpreted as the scope, so earlier channel markers remain part of cross-channel measure names. Unknown family prefixes are retained with ``family=None``. Each call returns a fresh, independently mutable DataFrame so caller changes cannot corrupt the cached canonical result. .. py:function:: compute_family_missingness(df, descriptor_names, known_families = KNOWN_FAMILY_TOKENS, *, feature_schema = None) Enrich per-column missingness with descriptor family metadata. .. py:function:: compute_family_constant_summary(df, descriptor_names, tol = 1e-12, known_families = KNOWN_FAMILY_TOKENS, *, feature_schema = None) Enrich per-column constant-feature results with family metadata. .. py:function:: select_viable_feature_columns(feature_df, descriptor_names, *, max_missing_rate = 0.2, drop_all_nan = True, drop_constant = True, constant_tol = 1e-12, max_row_drop_rate = None, known_families = KNOWN_FAMILY_TOKENS) Select descriptor columns that pass missingness and degeneracy gates. After the missingness/constant gates, if ``max_row_drop_rate`` is set the surviving columns are further pruned (worst-NaN first) so the rows that the caller's any-NaN purge would still drop stay within ``rate * n_rows`` — i.e. prefer shedding a few NaN columns over losing observations. .. py:function:: summarize_failures(failure_df) Summarize an extraction failure log by family, channel, and exception. :param failure_df: Failure records as produced by :class:`~coco_pipe.descriptors.core.DescriptorPipeline`, with optional ``family``, ``channel_name``, ``exception_type``, and ``condition`` columns. :returns: ``by_family``, ``by_channel``, ``by_exception_type``, ``by_condition``, ``by_family_channel`` (one row per (family, channel) pair), and ``combined`` (the four ``by_*`` group summaries stacked with a ``group`` column identifying their origin). :rtype: dict of DataFrame .. py:function:: add_family_diagnostics(family_summary_df, feature_missingness_df, feature_df) Add family-specific sanity diagnostics to a family-QC summary. Extends each row of *family_summary_df* (e.g. from :func:`aggregate_family_qc`) with diagnostics specific to the ``band``, ``param``, and ``complexity`` descriptor families: - ``band``: rate of negative absolute-power values, out-of-range relative power values (outside ``[0, 1]``), and NaN ratio features. - ``param``: median/p05 of FOOOF ``r_squared``, median/p95 of ``fit_error``, and missingness of peak-related measures. - ``complexity``: median/max missingness across complexity measures and the non-finite rate. :param family_summary_df: One row per family, as produced by :func:`aggregate_family_qc`. :param feature_missingness_df: Per-column missingness with family metadata, as produced by :func:`compute_family_missingness`. :param feature_df: The underlying feature values (epoch- or subject-level) used to compute value-based diagnostics. :returns: A copy of *family_summary_df* with additional family-specific columns. :rtype: pd.DataFrame .. py:function:: aggregate_family_qc(df, descriptor_names, failures_df = None, known_families = KNOWN_FAMILY_TOKENS, tol = 1e-12, feature_schema = None) Aggregate descriptor health indicators to one row per family.