coco_pipe.io.quality ==================== .. py:module:: coco_pipe.io.quality .. autoapi-nested-parse:: Data quality measurement and QC gating for loaded data containers. Four QC levels are defined: * **Level 1** — NaN / Inf / extreme-value row drops at load time (handled by :func:`~coco_pipe.descriptors.load_descriptor_table`; counts surfaced via container meta). * **Level 2** — Epoch-level MAD outlier rejection (:func:`drop_epoch_outliers`). * **Level 3** — Subject-level outlier rejection (:func:`drop_subject_outliers`). * **Level 4** — Compose levels 2 + 3 and record every decision in a :class:`QCResult` (:func:`run_qc`). The module also provides lower-level primitives used by the levels above: :func:`compute_row_outlier_scores` (MAD z-scores per row), :func:`compute_subject_outlier_burden` (per-subject mean epoch burden), and :func:`row_quality_score` (simple NaN/Inf/zero count per row, used for quality-weighted sampling). Classes ------- .. autoapisummary:: coco_pipe.io.quality.EpochDropRecord coco_pipe.io.quality.SubjectDropRecord coco_pipe.io.quality.QCResult coco_pipe.io.quality.CheckResult Functions --------- .. autoapisummary:: coco_pipe.io.quality.make_qc_flag coco_pipe.io.quality.resolve_qc_status coco_pipe.io.quality.group_labels coco_pipe.io.quality.compute_row_outlier_scores coco_pipe.io.quality.compute_subject_outlier_burden coco_pipe.io.quality.check_missingness coco_pipe.io.quality.check_constant_columns coco_pipe.io.quality.check_outliers_zscore coco_pipe.io.quality.check_flatline coco_pipe.io.quality.drop_epoch_outliers coco_pipe.io.quality.drop_subject_outliers coco_pipe.io.quality.run_qc Module Contents --------------- .. py:class:: EpochDropRecord Record of one dropped observation. .. py:attribute:: obs_index :type: int .. py:attribute:: obs_id :type: str .. py:attribute:: outlier_fraction :type: float .. py:attribute:: mad_z_max :type: float .. py:class:: SubjectDropRecord Record of one dropped subject. .. py:attribute:: subject_id :type: str .. py:attribute:: outlier_fraction :type: float .. py:attribute:: n_outlier_features :type: float .. py:class:: QCResult Structured log of QC decisions produced by :func:`run_qc`. Fields are populated by ``run_qc`` and, for the Level-1 counts (``n_rows_entering_qc``, ``n_dropped_nan_inf``, ``n_dropped_extreme``), read from the container's ``meta`` dict written by :func:`~coco_pipe.descriptors.load_descriptor_table`. ``family_qc`` is **not** populated by ``run_qc`` — the caller computes it via :func:`~coco_pipe.descriptors.qc.aggregate_family_qc` and attaches it afterwards (``result.family_qc = aggregate_family_qc(...)``), keeping the ``io`` layer free of ``descriptors`` imports. .. py:attribute:: n_rows_entering_qc :type: int | None :value: None .. py:attribute:: n_dropped_nan_inf :type: int :value: 0 .. py:attribute:: n_dropped_extreme :type: int :value: 0 .. py:attribute:: n_obs_in :type: int :value: 0 .. py:attribute:: n_obs_out :type: int :value: 0 .. py:attribute:: n_subjects_in :type: int :value: 0 .. py:attribute:: n_subjects_out :type: int :value: 0 .. py:attribute:: epoch_drop_threshold :type: float | None :value: None .. py:attribute:: epoch_outlier_fraction_threshold :type: float | None :value: None .. py:attribute:: epochs_dropped :type: list[EpochDropRecord] :value: [] .. py:attribute:: subject_drop_threshold :type: float | None :value: None .. py:attribute:: subject_outlier_fraction_threshold :type: float | None :value: None .. py:attribute:: subjects_dropped :type: list[SubjectDropRecord] :value: [] .. py:attribute:: per_family_dropped :type: dict[str, list[EpochDropRecord | SubjectDropRecord]] .. py:attribute:: subject_outlier_burden :type: pandas.DataFrame | None :value: None .. py:attribute:: feature_missingness :type: pandas.DataFrame | None :value: None .. py:attribute:: feature_columns_dropped :type: pandas.DataFrame | None :value: None .. py:attribute:: family_qc :type: pandas.DataFrame | None :value: None .. py:attribute:: thresholds :type: dict[str, Any] .. py:property:: n_epochs_dropped :type: int .. py:property:: n_subjects_dropped :type: int .. py:property:: retention_rate :type: float Return the fraction of input observations retained. .. py:property:: total_dropped :type: int Return total rows dropped across all QC levels. .. py:method:: summary() Return a flat summary suitable for logs and report headers. .. py:class:: CheckResult Result of a data quality check. :ivar check_name: Name of the check (e.g., "Missing Values"). :vartype check_name: str :ivar status: "OK", "WARN", or "FAIL". :vartype status: str :ivar message: Human-readable description of the issue. :vartype message: str :ivar severity: 0 (Info) to 10 (Critical). :vartype severity: int :ivar metric_name: Name of the metric evaluated (e.g., "missing_pct"). :vartype metric_name: str, optional :ivar metric_value: Value of the metric. :vartype metric_value: float, int, or str, optional .. rubric:: Examples >>> res = CheckResult("Missingness", "FAIL", "Too many NaNs", 9) >>> res.is_issue True .. py:attribute:: check_name :type: str .. py:attribute:: status :type: coco_pipe.io._constants.QualityStatus .. py:attribute:: message :type: str .. py:attribute:: severity :type: int .. py:attribute:: metric_name :type: str | None :value: None .. py:attribute:: metric_value :type: float | int | str | None :value: None .. py:property:: is_issue :type: bool Return True if status is WARN or FAIL. .. py:method:: from_flag_dict(flag) :classmethod: Construct a CheckResult from a :func:`make_qc_flag` record. .. py:function:: make_qc_flag(level, code, message, value = None, threshold = None, scope = None) Create a structured QC flag record. .. py:function:: resolve_qc_status(flags) Return the worst status level from a list of QC flag dicts. .. py:function:: group_labels(container, group_by = 'family') Unique feature-group labels a container spans, at ``group_by`` granularity. Resolves labels from the container's structured :meth:`~coco_pipe.io.structures.DataContainer.feature_schema` — enriched from descriptor-name parsing when the schema is partial — and returns them de-duplicated in first-seen order. This is the structured replacement for hand-rolled "which families/measures does this analysis unit cover" helpers: pass the sliced unit container from :func:`~coco_pipe.io.units.iter_analysis_units` to learn which QC labels it maps to. :param container: Any container with a ``feature`` axis (e.g. one analysis unit). :param group_by: Grouping granularity: ``"family"``, ``"measure"``, or ``"feature"``. :returns: Distinct labels at the requested granularity; ``[]`` when the container has no ``feature`` axis. :rtype: list of str .. py:function:: compute_row_outlier_scores(df, feature_cols, z_threshold = 5.0, descriptor_names = None, group_by = None, feature_schema = None) Compute per-row outlier fractions using MAD-based robust z-scores. When ``group_by`` is set (``"family"``, ``"measure"``, or ``"feature"``) the result also carries per-group ``outlier_fraction_