coco_pipe.io.load
=================

.. py:module:: coco_pipe.io.load

.. autoapi-nested-parse::

   coco_pipe/io/load.py
   --------------------
   High-level data loading factory.

   Author: Hamza Abdelhedi <hamza.abdelhedi@umontreal.ca>


Functions
---------

.. autoapisummary::

   coco_pipe.io.load.load_data


Module Contents
---------------

.. py:function:: load_data(path = None, mode = 'auto', target_col = None, index_col = None, sep = '\t', header = 0, sheet_name = 0, columns_to_dims = None, col_sep = '_', meta_columns = None, clean = False, clean_kwargs = None, task = None, session = None, runs = None, datatype = 'eeg', suffix = None, loading_mode = 'epochs', window_length = None, stride = None, event_id = None, tmin = -0.2, tmax = 0.5, baseline = None, drop_short_epochs = True, units = None, subject_metadata_df = None, subject_key = None, pattern = '*.pkl', dims = ('obs', 'feature'), coords = None, run = None, processing = None, reader = None, id_fn = None, subjects = None, config = None, **kwargs)

   Universal data loader factory.
   Dispatches to `BIDSDataset`, `TabularDataset`, or `EmbeddingDataset` based on
   `mode`.

   :param path: Path to data source (file or directory). Required unless ``config`` is
                given (in which case ``config.path`` is used).
   :type path: str or Path, optional
   :param mode: Type of data to load.

                - "auto": Infers type from file extension or directory structure. A
                  directory with ``dataset_description.json`` or ``sub-*`` entries is
                  treated as ``"bids"``; ``.csv``/``.tsv``/``.xls``/``.xlsx``/``.txt``
                  files as ``"tabular"``; everything else as ``"embedding"``.
                - "tabular": uses `TabularDataset` (CSV, TSV, Excel, TXT).
                - "bids": uses `BIDSDataset` (BIDS-compliant directories).
                - "embedding": uses `EmbeddingDataset` (NPY, PKL, H5, JSON).
   :type mode: {"auto", "tabular", "bids", "embedding"}, default="auto"
   :param config: A pre-validated configuration object (see :mod:`coco_pipe.io.config`).
                  When provided, its fields drive the load and ``mode`` is taken from the
                  config; the matching keyword arguments below are ignored. When omitted,
                  the relevant keyword arguments are assembled into a config and validated
                  before dispatch. The non-serializable parameters ``reader``, ``id_fn``,
                  ``subject_metadata_df``, and ``subject_key`` are always passed through
                  directly and are never part of the config schema.
   :type config: DatasetConfig or {Tabular,BIDS,Embedding}Config, optional
   :param Tabular Arguments (mode="tabular"):
   :param ----------------------------------:
   :param target_col: Name of the column to extract as target `y`. Removed from features `X`.
   :type target_col: str, optional
   :param index_col: Column to use as index (observation IDs).
   :type index_col: str or int, optional
   :param sep: Separator for text files (e.g. ',' for CSV).
   :type sep: str, default='\t'
   :param header: Row number(s) to use as column names.
   :type header: int or list of int, default=0
   :param sheet_name: Sheet name or index for Excel files.
   :type sheet_name: str or int, default=0
   :param columns_to_dims: If provided, attempts to reshape 2D feature columns into N-D dimensions.
                           Columns must follow: `dim1_dim2_..._feature`.
   :type columns_to_dims: list of str, optional
   :param col_sep: Separator used in column names for reshaping.
   :type col_sep: str, default='_'
   :param meta_columns: Columns to extract as metadata coordinates instead of features.
   :type meta_columns: list of str, optional
   :param clean: Whether to perform automated cleaning (drop NaNs/Infs).
   :type clean: bool, default=False
   :param clean_kwargs: Arguments passed to `TabularDataset.clean`.
   :type clean_kwargs: dict, optional
   :param BIDS Arguments (mode="bids"):
   :param ----------------------------:
   :param task: BIDS task name (e.g., 'rest', 'audiovisual').
   :type task: str, optional
   :param session: Session ID(s) to load. Defaults to all available.
   :type session: str or List[str], optional
   :param datatype: Data type folder (e.g., 'eeg', 'meg', 'ieeg').
   :type datatype: str, default='eeg'
   :param suffix: File suffix to load (e.g., 'eeg', 'epo', 'ave').
   :type suffix: str, optional
   :param loading_mode: How to process the data. Renamed to ``loading_mode`` here (and in
                        ``BIDSConfig``) to avoid colliding with this function's ``mode``
                        argument; it is passed through as ``mode`` to ``BIDSDataset``.

                        - 'epochs': Splices continuous data into fixed-length windows.
                        - 'continuous': Loads as single continuous segments.
                        - 'load_existing': Loads pre-computed epochs.
   :type loading_mode: str, default='epochs'
   :param window_length: Window length in seconds (for 'epochs' mode).
   :type window_length: float, optional
   :param stride: Stride in seconds (for 'epochs' mode).
   :type stride: float, optional
   :param subject_metadata_df: External subject-level metadata to merge by subject during BIDS loading.
   :type subject_metadata_df: DataFrame, optional
   :param subject_key: Column in `subject_metadata_df` containing the BIDS subject identifier.
   :type subject_key: str, optional
   :param subjects: Specific subject IDs to load (without 'sub-').
   :type subjects: str or List[str], optional
   :param Embedding Arguments (mode="embedding"):
   :param --------------------------------------:
   :param pattern: Glob pattern to match files.
   :type pattern: str, default=r'\*.pkl'
   :param dims: Dimension labels for the data arrays.
   :type dims: tuple of str, default=('obs', 'feature')
   :param coords: Dictionary of coordinates for dimensions.
   :type coords: dict, optional
   :param reader: Custom file reader function.
   :type reader: callable, optional
   :param id_fn: Custom subject ID extraction function.
   :type id_fn: callable, optional
   :param subjects: If int, loads first N subjects. If list, filters by ID.
   :type subjects: int or list, optional

   :returns: Standardized data container with attributes:
             - X: (N_obs, ...) data array
             - y: Targets (if available)
             - ids: Observation identifiers
             - coords: Coordinate metadata
   :rtype: DataContainer

   .. rubric:: Examples

   Two equivalent ways to load. The keyword form is convenient for quick,
   interactive use:

   >>> container = load_data("features.csv", mode="tabular", target_col="y")

   The **config-first** form is recommended for pipelines and reproducible
   runs: a :class:`~coco_pipe.io.config.TabularConfig` /
   :class:`~coco_pipe.io.config.BIDSConfig` /
   :class:`~coco_pipe.io.config.EmbeddingConfig` is validated once and can be
   serialized, version-controlled, and reused. It also keeps each mode's
   options self-contained instead of mixing all three modes' keywords:

   >>> from coco_pipe.io.config import TabularConfig
   >>> cfg = TabularConfig(path="features.csv", target_col="y")
   >>> container = load_data(config=cfg)

   BIDS loading uses ``loading_mode`` (not ``mode``) to choose the windowing
   strategy:

   >>> container = load_data(
   ...     "/data/bids",
   ...     mode="bids",
   ...     task="rest",
   ...     loading_mode="epochs",
   ...     window_length=2.0,
   ... )