Data & IO#

The coco_pipe.io module is the data backbone of coco-pipe. It loads datasets from many sources into a single labelled structure — DataContainer — that every other module consumes, and it provides quality-control and persistence utilities around it.

Design Philosophy

One in-memory contract, everywhere. Whether your data starts as a tabular feature table, a BIDS M/EEG dataset, or an array of foundation-model embeddings, it is loaded into the same DataContainer, so feature extraction, reduction, decoding, visualization, and reporting all compose without bespoke glue code.

Key Features

  • One loader, load_data(), with auto-detected modes for tabular files, BIDS datasets, and embedding derivatives.

  • A labelled, dimension-aware container with tidy selection, aggregation, and normalization helpers.

  • Inline data-quality checks (missingness, constant columns, outliers, flatlines) and a one-call run_qc().

  • Typed dataset configs (DatasetConfig, BIDSConfig, TabularConfig, EmbeddingConfig).

1. The DataContainer#

A DataContainer wraps a numeric array X with named dimensions, coordinate labels, optional targets y, observation ids, and free-form meta.

from coco_pipe.io import DataContainer
import numpy as np

container = DataContainer(
    X=np.random.randn(100, 32),       # (obs, feature)
    dims=("obs", "feature"),
    y=labels,                          # optional targets
    ids=subject_ids,                   # optional observation ids
)

container.X            # the numeric array
container.y            # targets
container.shape        # array shape
container.ids          # observation ids

Common transformations return new containers (nothing mutates in place):

sub = container.select(feature=["alpha", "beta"])   # label-based selection
sub = container.isel(obs=slice(0, 10))              # positional selection
z = container.zscore(dim="obs")                      # normalize
pooled = container.aggregate_groups("subject", how="mean")
flat = container.flatten()                           # collapse to 2D (obs, feature)

container.save("data.pkl")
restored = DataContainer.load("data.pkl")

2. Loading Data#

load_data() is the single entry point. mode="auto" infers the source type; pass an explicit mode or a typed config for full control.

from coco_pipe.io import load_data

# Tabular feature table (CSV / parquet / Excel); columns map to dimensions.
container = load_data("features.csv", target_col="diagnosis")

# BIDS M/EEG dataset, epoched on load (requires the [eeg] extra).
container = load_data(
    "/data/bids", mode="bids", datatype="eeg",
    loading_mode="epochs", tmin=-0.2, tmax=0.5,
)

# Precomputed embedding derivatives.
container = load_data("/data/embeddings", mode="embeddings")

For reproducible, validated loading, pass a config object instead of loose keyword arguments:

from coco_pipe.io import TabularConfig

container = load_data(config=TabularConfig(path="features.csv", target_col="diagnosis"))

3. Quality Control#

Run the standard quality suite over a container in one call. It returns the (optionally cleaned) container plus a QCResult:

from coco_pipe.io import run_qc

container, qc = run_qc(container, subject_col="subject")
print(qc)   # missingness, outlier, and flatline findings

The individual checks live in coco_pipe.io.quality and each returns a CheckResult:

Check

What it flags

coco_pipe.io.quality.check_missingness()

Fraction of NaN values per feature.

coco_pipe.io.quality.check_constant_columns()

Near-zero-variance (constant) columns.

coco_pipe.io.quality.check_outliers_zscore()

Z-score outliers above a threshold.

coco_pipe.io.quality.check_flatline()

Zero-variance (flatlined) signal arrays.

The same checks run automatically when a DataContainer is added to a report.

4. Persistence#

Byte-level helpers back the container’s save/load and are available directly for custom artifacts:

from coco_pipe.io import save_object, load_object, write_json, read_json, save_npz

save_object(obj, "artifact.pkl")
obj = load_object("artifact.pkl")
write_json(metadata, "meta.json")

See the API Reference for the complete coco_pipe.io API.