coco_pipe.io.load ================= .. py:module:: coco_pipe.io.load .. autoapi-nested-parse:: coco_pipe/io/load.py -------------------- High-level data loading factory. Author: Hamza Abdelhedi Functions --------- .. autoapisummary:: coco_pipe.io.load.load_data Module Contents --------------- .. py:function:: load_data(path = None, mode = 'auto', target_col = None, index_col = None, sep = '\t', header = 0, sheet_name = 0, columns_to_dims = None, col_sep = '_', meta_columns = None, clean = False, clean_kwargs = None, task = None, session = None, runs = None, datatype = 'eeg', suffix = None, loading_mode = 'epochs', window_length = None, stride = None, event_id = None, tmin = -0.2, tmax = 0.5, baseline = None, drop_short_epochs = True, units = None, subject_metadata_df = None, subject_key = None, pattern = '*.pkl', dims = ('obs', 'feature'), coords = None, run = None, processing = None, reader = None, id_fn = None, subjects = None, config = None, **kwargs) Universal data loader factory. Dispatches to `BIDSDataset`, `TabularDataset`, or `EmbeddingDataset` based on `mode`. :param path: Path to data source (file or directory). Required unless ``config`` is given (in which case ``config.path`` is used). :type path: str or Path, optional :param mode: Type of data to load. - "auto": Infers type from file extension or directory structure. A directory with ``dataset_description.json`` or ``sub-*`` entries is treated as ``"bids"``; ``.csv``/``.tsv``/``.xls``/``.xlsx``/``.txt`` files as ``"tabular"``; everything else as ``"embedding"``. - "tabular": uses `TabularDataset` (CSV, TSV, Excel, TXT). - "bids": uses `BIDSDataset` (BIDS-compliant directories). - "embedding": uses `EmbeddingDataset` (NPY, PKL, H5, JSON). :type mode: {"auto", "tabular", "bids", "embedding"}, default="auto" :param config: A pre-validated configuration object (see :mod:`coco_pipe.io.config`). When provided, its fields drive the load and ``mode`` is taken from the config; the matching keyword arguments below are ignored. When omitted, the relevant keyword arguments are assembled into a config and validated before dispatch. The non-serializable parameters ``reader``, ``id_fn``, ``subject_metadata_df``, and ``subject_key`` are always passed through directly and are never part of the config schema. :type config: DatasetConfig or {Tabular,BIDS,Embedding}Config, optional :param Tabular Arguments (mode="tabular"): :param ----------------------------------: :param target_col: Name of the column to extract as target `y`. Removed from features `X`. :type target_col: str, optional :param index_col: Column to use as index (observation IDs). :type index_col: str or int, optional :param sep: Separator for text files (e.g. ',' for CSV). :type sep: str, default='\t' :param header: Row number(s) to use as column names. :type header: int or list of int, default=0 :param sheet_name: Sheet name or index for Excel files. :type sheet_name: str or int, default=0 :param columns_to_dims: If provided, attempts to reshape 2D feature columns into N-D dimensions. Columns must follow: `dim1_dim2_..._feature`. :type columns_to_dims: list of str, optional :param col_sep: Separator used in column names for reshaping. :type col_sep: str, default='_' :param meta_columns: Columns to extract as metadata coordinates instead of features. :type meta_columns: list of str, optional :param clean: Whether to perform automated cleaning (drop NaNs/Infs). :type clean: bool, default=False :param clean_kwargs: Arguments passed to `TabularDataset.clean`. :type clean_kwargs: dict, optional :param BIDS Arguments (mode="bids"): :param ----------------------------: :param task: BIDS task name (e.g., 'rest', 'audiovisual'). :type task: str, optional :param session: Session ID(s) to load. Defaults to all available. :type session: str or List[str], optional :param datatype: Data type folder (e.g., 'eeg', 'meg', 'ieeg'). :type datatype: str, default='eeg' :param suffix: File suffix to load (e.g., 'eeg', 'epo', 'ave'). :type suffix: str, optional :param loading_mode: How to process the data. Renamed to ``loading_mode`` here (and in ``BIDSConfig``) to avoid colliding with this function's ``mode`` argument; it is passed through as ``mode`` to ``BIDSDataset``. - 'epochs': Splices continuous data into fixed-length windows. - 'continuous': Loads as single continuous segments. - 'load_existing': Loads pre-computed epochs. :type loading_mode: str, default='epochs' :param window_length: Window length in seconds (for 'epochs' mode). :type window_length: float, optional :param stride: Stride in seconds (for 'epochs' mode). :type stride: float, optional :param subject_metadata_df: External subject-level metadata to merge by subject during BIDS loading. :type subject_metadata_df: DataFrame, optional :param subject_key: Column in `subject_metadata_df` containing the BIDS subject identifier. :type subject_key: str, optional :param subjects: Specific subject IDs to load (without 'sub-'). :type subjects: str or List[str], optional :param Embedding Arguments (mode="embedding"): :param --------------------------------------: :param pattern: Glob pattern to match files. :type pattern: str, default=r'\*.pkl' :param dims: Dimension labels for the data arrays. :type dims: tuple of str, default=('obs', 'feature') :param coords: Dictionary of coordinates for dimensions. :type coords: dict, optional :param reader: Custom file reader function. :type reader: callable, optional :param id_fn: Custom subject ID extraction function. :type id_fn: callable, optional :param subjects: If int, loads first N subjects. If list, filters by ID. :type subjects: int or list, optional :returns: Standardized data container with attributes: - X: (N_obs, ...) data array - y: Targets (if available) - ids: Observation identifiers - coords: Coordinate metadata :rtype: DataContainer .. rubric:: Examples Two equivalent ways to load. The keyword form is convenient for quick, interactive use: >>> container = load_data("features.csv", mode="tabular", target_col="y") The **config-first** form is recommended for pipelines and reproducible runs: a :class:`~coco_pipe.io.config.TabularConfig` / :class:`~coco_pipe.io.config.BIDSConfig` / :class:`~coco_pipe.io.config.EmbeddingConfig` is validated once and can be serialized, version-controlled, and reused. It also keeps each mode's options self-contained instead of mixing all three modes' keywords: >>> from coco_pipe.io.config import TabularConfig >>> cfg = TabularConfig(path="features.csv", target_col="y") >>> container = load_data(config=cfg) BIDS loading uses ``loading_mode`` (not ``mode``) to choose the windowing strategy: >>> container = load_data( ... "/data/bids", ... mode="bids", ... task="rest", ... loading_mode="epochs", ... window_length=2.0, ... )