Vision#

How we envisage coco-pipe evolving — what it is today, the principles that guide it, and where it is headed.

The idea#

coco-pipe is, first, an engine that brings the tools of cognitive and computational neuroscience under one roof — feature extraction, dimensionality reduction, trajectory analysis, classical decoding, and foundation models — all speaking a single data structure so they compose without friction.

On top of that engine we are building end-to-end pipelines. The goal is simple: throw your preprocessed data at a pipeline and, with a few CLI commands, run a battery of complementary analyses to understand it — then move from broad, exploratory analysis to focused, targeted questions, again powered by the same engine. The audience is both neuroscientists who want rigorous answers without writing boilerplate and engineers who want a dependable, composable toolkit.

We begin with M/EEG. The architecture is deliberately modality-agnostic at its core, and we aim to extend to other modalities as the engine matures.

Roadmap#

Phase 1 — Build and harden the engine (current)

We are building and testing the engine and preparing the tools the pipelines will need: a shared DataContainer, leakage-safe decoding for classical and foundation models, dimensionality reduction with trajectory analysis and interpretation, and self-contained reporting. This phase is about correctness, reproducibility, and a clean, composable API.

Phase 2 — End-to-end pipelines

With the engine in place, we will ship opinionated, CLI-driven pipelines that chain these tools into ready-made exploratory and targeted analyses — so going from preprocessed data to interpretable results is a few commands, not a research-engineering project.

Design principles#

Valid by construction

Cross-validation, feature selection, tuning, and calibration run inside the appropriate fold. Sound inference is the default, never an afterthought.

One data structure, end to end

Every module reads and writes a single DataContainer, so steps compose without glue code and results stay self-describing.

Declarative and reproducible

Workflows are described with validated, typed configs and seeded throughout, so an experiment is explicit, serializable, and bit-reproducible.

Lightweight core, optional power

Heavy dependencies — deep learning, distributed compute, manifold libraries — load lazily behind extras; the base install stays small.

Comparable, shareable results

Reports are interactive, lineage-aware, and self-contained, built to line up many analyses side by side.

Modality-agnostic at the core

M/EEG first, but the contracts are designed so new modalities slot in without reworking the engine.

What we have today#

The engine is real and in active use. Highlights per module:

📦 Data & IO

One load_data() for tabular, BIDS, and embedding sources into a labelled DataContainer; built-in quality control.

🌊 Descriptors

Spectral, parametric, and complexity feature families with channel pooling, emitted container-native.

🧠 Decoding — Classical ML

Leakage-free CV, group-aware inference, feature selection and tuning, and full-pipeline permutation testing.

🤖 Decoding — Foundation Models

Frozen, fine-tuned, and LoRA/QLoRA backbones treated as ordinary, comparable estimators, with leakage-safe cached embeddings.

🌀 Dimensionality Reduction

15+ reducers behind one interface, preservation metrics, feature interpretation, and post-hoc method ranking.

📈 Trajectory Analysis

Kinematics and time-resolved group separation over native 3D (trajectory, time, dim) embedding tensors.

📊 Visualization

Mirrored Matplotlib/Plotly backends from one theme, from exploratory to publication-ready.

📄 Reports

Self-contained, interactive HTML that makes many experiments easy to compare.

Where we are heading#

CLI-driven end-to-end pipelines

(Phase 2) — preprocessed data in, exploratory-then-targeted analysis out, in a few commands.

Deeper foundation-model workflows

More backbones, richer fine-tuning, and embedding reuse across analyses.

Domain-aware methods

Reducers and metrics that respect the physical and topological structure of neural data.

Comparative reporting at scale

Reports that line up models, modalities, and subjects for side-by-side interpretation.

More modalities

Extending beyond M/EEG as the engine matures, and broader standardized loading.

Get involved#

coco-pipe is in active pre-release development and we welcome contributions. See the Contributing to coco-pipe guide to get started.