coco_pipe.io.load_data#
- coco_pipe.io.load_data(path=None, mode='auto', target_col=None, index_col=None, sep='\t', header=0, sheet_name=0, columns_to_dims=None, col_sep='_', meta_columns=None, clean=False, clean_kwargs=None, task=None, session=None, runs=None, datatype='eeg', suffix=None, loading_mode='epochs', window_length=None, stride=None, event_id=None, tmin=-0.2, tmax=0.5, baseline=None, drop_short_epochs=True, units=None, subject_metadata_df=None, subject_key=None, pattern='*.pkl', dims=('obs', 'feature'), coords=None, run=None, processing=None, reader=None, id_fn=None, subjects=None, config=None, **kwargs)#
Universal data loader factory. Dispatches to BIDSDataset, TabularDataset, or EmbeddingDataset based on mode.
- Parameters:
path (str or Path, optional) – Path to data source (file or directory). Required unless
configis given (in which caseconfig.pathis used).mode ({"auto", "tabular", "bids", "embedding"}, default="auto") –
Type of data to load.
”auto”: Infers type from file extension or directory structure. A directory with
dataset_description.jsonorsub-*entries is treated as"bids";.csv/.tsv/.xls/.xlsx/.txtfiles as"tabular"; everything else as"embedding".”tabular”: uses TabularDataset (CSV, TSV, Excel, TXT).
”bids”: uses BIDSDataset (BIDS-compliant directories).
”embedding”: uses EmbeddingDataset (NPY, PKL, H5, JSON).
config (DatasetConfig or {Tabular,BIDS,Embedding}Config, optional) – A pre-validated configuration object (see
coco_pipe.io.config). When provided, its fields drive the load andmodeis taken from the config; the matching keyword arguments below are ignored. When omitted, the relevant keyword arguments are assembled into a config and validated before dispatch. The non-serializable parametersreader,id_fn,subject_metadata_df, andsubject_keyare always passed through directly and are never part of the config schema.(mode="tabular") (Tabular Arguments)
----------------------------------
target_col (str, optional) – Name of the column to extract as target y. Removed from features X.
index_col (str or int, optional) – Column to use as index (observation IDs).
sep (str, default='t') – Separator for text files (e.g. ‘,’ for CSV).
header (int or list of int, default=0) – Row number(s) to use as column names.
sheet_name (str or int, default=0) – Sheet name or index for Excel files.
columns_to_dims (list of str, optional) – If provided, attempts to reshape 2D feature columns into N-D dimensions. Columns must follow: dim1_dim2_…_feature.
col_sep (str, default='_') – Separator used in column names for reshaping.
meta_columns (list of str, optional) – Columns to extract as metadata coordinates instead of features.
clean (bool, default=False) – Whether to perform automated cleaning (drop NaNs/Infs).
clean_kwargs (dict, optional) – Arguments passed to TabularDataset.clean.
(mode="bids") (BIDS Arguments)
----------------------------
task (str, optional) – BIDS task name (e.g., ‘rest’, ‘audiovisual’).
session (str or List[str], optional) – Session ID(s) to load. Defaults to all available.
datatype (str, default='eeg') – Data type folder (e.g., ‘eeg’, ‘meg’, ‘ieeg’).
suffix (str, optional) – File suffix to load (e.g., ‘eeg’, ‘epo’, ‘ave’).
loading_mode (str, default='epochs') –
How to process the data. Renamed to
loading_modehere (and inBIDSConfig) to avoid colliding with this function’smodeargument; it is passed through asmodetoBIDSDataset.’epochs’: Splices continuous data into fixed-length windows.
’continuous’: Loads as single continuous segments.
’load_existing’: Loads pre-computed epochs.
window_length (float, optional) – Window length in seconds (for ‘epochs’ mode).
stride (float, optional) – Stride in seconds (for ‘epochs’ mode).
subject_metadata_df (DataFrame, optional) – External subject-level metadata to merge by subject during BIDS loading.
subject_key (str, optional) – Column in subject_metadata_df containing the BIDS subject identifier.
subjects (int or list, optional) – Specific subject IDs to load (without ‘sub-‘).
(mode="embedding") (Embedding Arguments)
--------------------------------------
pattern (str, default=r'*.pkl') – Glob pattern to match files.
dims (tuple of str, default=('obs', 'feature')) – Dimension labels for the data arrays.
coords (dict, optional) – Dictionary of coordinates for dimensions.
reader (callable, optional) – Custom file reader function.
id_fn (callable, optional) – Custom subject ID extraction function.
subjects – If int, loads first N subjects. If list, filters by ID.
tmin (float)
tmax (float)
drop_short_epochs (bool)
units (str | None)
run (str | None)
processing (str | None)
- Returns:
Standardized data container with attributes: - X: (N_obs, …) data array - y: Targets (if available) - ids: Observation identifiers - coords: Coordinate metadata
- Return type:
Examples
Two equivalent ways to load. The keyword form is convenient for quick, interactive use:
>>> container = load_data("features.csv", mode="tabular", target_col="y")
The config-first form is recommended for pipelines and reproducible runs: a
TabularConfig/BIDSConfig/EmbeddingConfigis validated once and can be serialized, version-controlled, and reused. It also keeps each mode’s options self-contained instead of mixing all three modes’ keywords:>>> from coco_pipe.io.config import TabularConfig >>> cfg = TabularConfig(path="features.csv", target_col="y") >>> container = load_data(config=cfg)
BIDS loading uses
loading_mode(notmode) to choose the windowing strategy:>>> container = load_data( ... "/data/bids", ... mode="bids", ... task="rest", ... loading_mode="epochs", ... window_length=2.0, ... )