coco_pipe.descriptors.io
========================

.. py:module:: coco_pipe.descriptors.io

.. autoapi-nested-parse::

   Descriptor-table file IO: save, load, merge, and feature-column consistency.

   This is the descriptor-specific table IO layer. It builds on the generic
   :func:`coco_pipe.io._serialization.read_table` primitive but owns the
   descriptor concerns: the ``_feature_columns.json`` sidecar contract, loading a
   descriptor table into a :class:`~coco_pipe.io.structures.DataContainer` (flat or
   sensor x feature), and the cross-shard **merge** stage.

   Author: Hamza Abdelhedi <hamza.abdelhedi@umontreal.ca>


Functions
---------

.. autoapisummary::

   coco_pipe.descriptors.io.save_descriptor_table
   coco_pipe.descriptors.io.check_feature_column_consistency
   coco_pipe.descriptors.io.merge_descriptor_tables
   coco_pipe.descriptors.io.load_descriptor_table


Module Contents
---------------

.. py:function:: save_descriptor_table(df, base_path, feature_columns = None, formats = ('parquet', ))

   Write a descriptor table, with an optional feature-column sidecar.

   :func:`load_descriptor_table` reads whichever single file it is pointed at,
   so by default only the canonical ``{base_path}.parquet`` is written.  Pass
   ``formats=("parquet", "csv")`` to additionally emit a human-readable
   ``{base_path}.csv`` (doubles the on-disk footprint).  A
   ``{base_path.name}_feature_columns.json`` sidecar is written when
   *feature_columns* is given.

   :param df: Table to write.
   :param base_path: Output path without suffix, e.g. ``combined/sensor_subject_features``.
   :param feature_columns: Optional ordered list of descriptor feature-column names written to
                           a ``_feature_columns.json`` sidecar alongside the table.
   :param formats: Table formats to write. Any subset of ``{"parquet", "csv"}``; defaults
                   to parquet only.

   :raises ValueError: If *formats* is empty or contains an unsupported format.


.. py:function:: check_feature_column_consistency(shard_root, json_name, accumulated, col_key)

   Load a feature-column sidecar from *shard_root* and assert consistency.

   Intended for merging per-shard descriptor outputs: on the first call for
   a given *col_key* the loaded column list is stored in *accumulated*. On
   every subsequent call the loaded list is compared against the stored one
   and a :class:`ValueError` is raised on any mismatch, preventing a silent
   merge of shards produced with incompatible feature sets.

   :param shard_root: Directory containing the ``json_name`` feature-column sidecar.
   :param json_name: Filename of the feature-column JSON sidecar within *shard_root*.
   :param accumulated: Mapping of ``col_key -> feature column list``, mutated in place.
   :param col_key: Key identifying which feature-column set this sidecar belongs to
                   (e.g. ``"sensor_epoch"``).


.. py:function:: merge_descriptor_tables(table_paths, feature_columns_paths = None, *, out_base_path = None, formats = ('parquet', ))

   Merge per-shard tables of one *table kind* into a single table.

   The cross-shard **merge** stage. A "table kind" is one descriptor output
   table — e.g. ``sensor_epoch`` / ``sensor_subject`` / ``pooled_subject`` —
   written once per shard; this row-concatenates that kind across shards. (It
   is not about the band/param/complexity descriptor *family*.) Each shard is
   read, its feature-column sidecar is optionally checked against the first, the
   rows are concatenated, and the combined table (plus sidecar) is optionally
   written via :func:`save_descriptor_table`. Discovery, manifests, and
   dataset-level QC are deliberately left to the caller, which calls this once
   per table kind.

   :param table_paths: Per-shard table files (``.csv`` / ``.parquet``) for one table kind, in
                       the desired row order.
   :param feature_columns_paths: Optional per-shard feature-column JSON sidecars, aligned with
                                 *table_paths*. When given, cross-shard consistency is enforced via
                                 :func:`check_feature_column_consistency` and the agreed column list is
                                 used as the combined sidecar.
   :param out_base_path: Optional output path without suffix. When set, the combined table is
                         written there via :func:`save_descriptor_table`.
   :param formats: Output formats forwarded to :func:`save_descriptor_table` (default
                   parquet only).

   :returns: ``(combined_df, feature_columns)`` where ``feature_columns`` is the
             validated column list when *feature_columns_paths* was provided, else
             ``None``.
   :rtype: tuple

   :raises ValueError: If *table_paths* is empty, the sidecar list is misaligned, or a shard's
       feature columns differ from the first shard.


.. py:function:: load_descriptor_table(table_path, feature_columns_path, known_families = ('band', 'param', 'complexity'), condition = None, target_col = None, subjects = None, subject_col = 'subject', analysis_mode = 'flat', descriptor_families = None, descriptor_max_abs_value = None, drop_degenerate_columns = False, max_missing_rate = 0.2, drop_constant_columns = True, constant_tol = 1e-12, max_row_drop_rate = None, location_statistic = None, exclude_subfamilies = None)

   Load a descriptor feature table into a :class:`~coco_pipe.io.DataContainer`.