.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/io/plot_01_data_container.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_io_plot_01_data_container.py: ==================== Data Structures Demo ==================== This example demonstrates the ``DataContainer`` and other core IO structures used in the coco-pipe package. The ``DataContainer`` is a powerful wrapper around N-dimensional numpy arrays that keeps track of dimensions, coordinates, and labels. .. GENERATED FROM PYTHON SOURCE LINES 13-16 Imports ------- First, let's import the necessary libraries. .. GENERATED FROM PYTHON SOURCE LINES 16-21 .. code-block:: Python import numpy as np from coco_pipe.io.structures import DataContainer .. GENERATED FROM PYTHON SOURCE LINES 22-26 1. Tabular Data (2D) -------------------- We can store standard 2D tabular data (Observations x Features). The DataContainer will automatically track the coordinates for each dimension. .. GENERATED FROM PYTHON SOURCE LINES 26-39 .. code-block:: Python X_tab = np.random.randn(5, 3) container_tab = DataContainer( X=X_tab, dims=("obs", "feature"), coords={ "obs": [f"sub-{i}" for i in range(5)], "feature": ["Alpha_Cz", "Alpha_Fz", "Beta_Pz"], }, ) print(f"Original container:\n{container_tab}") .. rst-class:: sphx-glr-script-out .. code-block:: none Original container: .. GENERATED FROM PYTHON SOURCE LINES 40-42 We can easily select data using wildcards on the coordinates. Let's select all features starting with "Alpha": .. GENERATED FROM PYTHON SOURCE LINES 42-47 .. code-block:: Python subset = container_tab.select(feature=["Alpha*"]) print(f"Selected (Alpha*):\n{subset}") .. rst-class:: sphx-glr-script-out .. code-block:: none Selected (Alpha*): .. GENERATED FROM PYTHON SOURCE LINES 48-54 2. EEG Data (3D) ---------------- The DataContainer excels at handling multi-dimensional data like EEG, which typically has dimensions (Observations x Channels x Time). Let's simulate data for 2 subjects, 2 conditions, and 4 epochs each. .. GENERATED FROM PYTHON SOURCE LINES 54-85 .. code-block:: Python n_subs = 2 n_conds = 2 n_epochs = 4 n_obs = n_subs * n_conds * n_epochs n_chans = 3 n_times = 10 X_eeg = np.random.randn(n_obs, n_chans, n_times) # Create tracking labels ids = [] conditions = [] for sub in range(n_subs): for cond in ["A", "B"]: for ep in range(n_epochs): ids.append(f"sub-{sub}_cond-{cond}_ep-{ep}") conditions.append(cond) container_eeg = DataContainer( X=X_eeg, y=np.array(conditions), ids=np.array(ids), dims=("obs", "channel", "time"), coords={"obs": ids, "channel": ["Fz", "Cz", "Pz"], "time": np.arange(n_times)}, ) print(f"EEG Container:\n{container_eeg}") print(f"First 5 IDs:\n{container_eeg.ids[:5]}") .. rst-class:: sphx-glr-script-out .. code-block:: none EEG Container: First 5 IDs: ['sub-0_cond-A_ep-0' 'sub-0_cond-A_ep-1' 'sub-0_cond-A_ep-2' 'sub-0_cond-A_ep-3' 'sub-0_cond-B_ep-0'] .. GENERATED FROM PYTHON SOURCE LINES 86-94 3. Flattening Data ------------------ We often need to flatten high-dimensional data into 2D matrices for standard machine learning algorithms (like PCA or classifiers), while preserving specific dimensions. **Flatten for TRCA (Spatial)**: Keep Observations and Channels, flatten Time. Result: (16, 3, 10) -> (Obs, Chan, Feature=Time) .. GENERATED FROM PYTHON SOURCE LINES 94-101 .. code-block:: Python flat_spatial = container_eeg.flatten(preserve=["obs"]) print( f"Flattened (Spatial): {flat_spatial.shape} dims={flat_spatial.dims} | " f"Coords: {list(flat_spatial.coords.keys())}" ) .. rst-class:: sphx-glr-script-out .. code-block:: none Flattened (Spatial): (16, 30) dims=('obs', 'feature') | Coords: ['obs', 'feature'] .. GENERATED FROM PYTHON SOURCE LINES 102-104 **Flatten for Standard ML (2D)**: Keep Observations only. Result: (16, 3*10) -> (16, 30) -> (Obs, Feature=Chan*Time) .. GENERATED FROM PYTHON SOURCE LINES 104-110 .. code-block:: Python flat_ml = container_eeg.flatten(preserve=["obs"]) print(f"Flattened (Standard 2D): {flat_ml.shape} dims={flat_ml.dims}") print(f"Sample Composite Features:\n{flat_ml.coords['feature'][:5]}") .. rst-class:: sphx-glr-script-out .. code-block:: none Flattened (Standard 2D): (16, 30) dims=('obs', 'feature') Sample Composite Features: ['Fz_0', 'Fz_1', 'Fz_2', 'Fz_3', 'Fz_4'] .. GENERATED FROM PYTHON SOURCE LINES 111-115 4. Aggregation -------------- You can aggregate data across coordinates or labels. Let's average the data across our "Condition" labels (A and B). .. GENERATED FROM PYTHON SOURCE LINES 115-120 .. code-block:: Python agg_cond = container_eeg.aggregate(by=container_eeg.y, stats="mean") print(f"Aggregated by Condition (A, B): {agg_cond.shape}\nIDs={agg_cond.ids}") .. rst-class:: sphx-glr-script-out .. code-block:: none Aggregated by Condition (A, B): (2, 3, 10) IDs=[np.str_('A') np.str_('B')] .. GENERATED FROM PYTHON SOURCE LINES 121-127 5. Advanced Selection --------------------- The ``select()`` method is very powerful. It supports wildcards, fuzzy matching, mathematical operators, and even custom callables. **Wildcard Epoch Selection** .. GENERATED FROM PYTHON SOURCE LINES 127-132 .. code-block:: Python subset_epochs = container_eeg.select(obs=["*ep-0", "*ep-1"]) print(f"Selected (*ep-0, *ep-1): {subset_epochs.shape} from {container_eeg.shape}") print(f"Selected IDs:\n{subset_epochs.ids}") .. rst-class:: sphx-glr-script-out .. code-block:: none Selected (*ep-0, *ep-1): (8, 3, 10) from (16, 3, 10) Selected IDs: ['sub-0_cond-A_ep-0' 'sub-0_cond-A_ep-1' 'sub-0_cond-B_ep-0' 'sub-0_cond-B_ep-1' 'sub-1_cond-A_ep-0' 'sub-1_cond-A_ep-1' 'sub-1_cond-B_ep-0' 'sub-1_cond-B_ep-1'] .. GENERATED FROM PYTHON SOURCE LINES 133-134 **Case-Insensitive Selection** .. GENERATED FROM PYTHON SOURCE LINES 134-138 .. code-block:: Python subset_fuzzy = container_eeg.select(channel=["fz"], ignore_case=True, fuzzy=False) print(f"Case-Insensitive 'fz' -> {subset_fuzzy.coords['channel']}") .. rst-class:: sphx-glr-script-out .. code-block:: none Case-Insensitive 'fz' -> ['Fz'] .. GENERATED FROM PYTHON SOURCE LINES 139-140 **Operator Selection (e.g., Time >= 5)** .. GENERATED FROM PYTHON SOURCE LINES 140-144 .. code-block:: Python subset_time = container_eeg.select(time={">=": 5}) print(f"Time >= 5 -> {subset_time.coords['time']}") .. rst-class:: sphx-glr-script-out .. code-block:: none Time >= 5 -> [5 6 7 8 9] .. GENERATED FROM PYTHON SOURCE LINES 145-146 **Filter by Target Label (Y)** .. GENERATED FROM PYTHON SOURCE LINES 146-150 .. code-block:: Python subset_cond = container_eeg.select(y=["B"]) print(f"Select Y='B' -> IDs:\n{subset_cond.ids[:3]}... (Total {subset_cond.shape[0]})") .. rst-class:: sphx-glr-script-out .. code-block:: none Select Y='B' -> IDs: ['sub-0_cond-B_ep-0' 'sub-0_cond-B_ep-1' 'sub-0_cond-B_ep-2']... (Total 8) .. GENERATED FROM PYTHON SOURCE LINES 151-153 **Stratified Selection via Callable** Keep only the first 2 epochs for each unique subject. .. GENERATED FROM PYTHON SOURCE LINES 153-171 .. code-block:: Python def first_n_per_subject(ids_array, n=2): """Custom selector: keeps first n occurrences of each unique subject prefix.""" subjects = [i.split("_")[0] for i in ids_array] mask = np.zeros(len(ids_array), dtype=bool) counts = {} for idx, sub in enumerate(subjects): if counts.get(sub, 0) < n: mask[idx] = True counts[sub] = counts.get(sub, 0) + 1 return mask subset_strat = container_eeg.select(ids=lambda x: first_n_per_subject(x, n=2)) print(f"First 2 epochs per subject:\n{subset_strat.ids}") .. rst-class:: sphx-glr-script-out .. code-block:: none First 2 epochs per subject: ['sub-0_cond-A_ep-0' 'sub-0_cond-A_ep-1' 'sub-1_cond-A_ep-0' 'sub-1_cond-A_ep-1'] .. GENERATED FROM PYTHON SOURCE LINES 172-176 6. Data Scaling and Normalization --------------------------------- The container provides built-in methods for data normalization. These operations return a new container with the normalized data. .. GENERATED FROM PYTHON SOURCE LINES 176-185 .. code-block:: Python # Z-score normalization (mean=0, std=1) across the time dimension zscored_eeg = container_eeg.zscore(dim="time") print( f"Z-scored EEG Data:\nMean: {np.mean(zscored_eeg.X):.3f}," f"\nStd: {np.std(zscored_eeg.X):.3f}" ) .. rst-class:: sphx-glr-script-out .. code-block:: none Z-scored EEG Data: Mean: -0.000, Std: 1.000 .. GENERATED FROM PYTHON SOURCE LINES 186-190 7. Restructuring Dimensions --------------------------- You can stack and unstack dimensions to change the shape of your data dynamically. Let's stack Observations and Channels into a single "obs_chan" dimension. .. GENERATED FROM PYTHON SOURCE LINES 190-194 .. code-block:: Python stacked = container_eeg.stack(dims=["obs", "channel"], new_dim="obs_chan") print(f"Stacked (Obs+Chan): {stacked.shape} dims={stacked.dims}") .. rst-class:: sphx-glr-script-out .. code-block:: none Stacked (Obs+Chan): (48, 10) dims=('obs_chan', 'time') .. GENERATED FROM PYTHON SOURCE LINES 195-196 And unstack it back out: .. GENERATED FROM PYTHON SOURCE LINES 196-201 .. code-block:: Python unstacked = stacked.unstack(dim="obs_chan") print(f"Unstacked back to: {unstacked.shape} dims={unstacked.dims}") .. rst-class:: sphx-glr-script-out .. code-block:: none Unstacked back to: (16, 3, 10) dims=('obs', 'channel', 'time') .. GENERATED FROM PYTHON SOURCE LINES 202-206 8. Working with Pandas ---------------------- For standard machine learning pipelines or EDA, you might want to export your observation metadata to a Pandas DataFrame. .. GENERATED FROM PYTHON SOURCE LINES 206-210 .. code-block:: Python df_obs = container_eeg.observation_frame() print("Observation DataFrame (First 5 rows):") print(df_obs.head()) .. rst-class:: sphx-glr-script-out .. code-block:: none Observation DataFrame (First 5 rows): obs sample_id 0 sub-0_cond-A_ep-0 sub-0_cond-A_ep-0 1 sub-0_cond-A_ep-1 sub-0_cond-A_ep-1 2 sub-0_cond-A_ep-2 sub-0_cond-A_ep-2 3 sub-0_cond-A_ep-3 sub-0_cond-A_ep-3 4 sub-0_cond-B_ep-0 sub-0_cond-B_ep-0 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.013 seconds) .. _sphx_glr_download_auto_examples_io_plot_01_data_container.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_01_data_container.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_01_data_container.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_01_data_container.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_