Data Loading and Saving

This module provides functions for loading experimental data from various formats and saving processed experiments.

Functions

driada.experiment.exp_build.load_experiment(data_source, exp_params, force_rebuild=False, force_reload=False, via_pydrive=True, gauth=None, root='DRIADA data', exp_path=None, data_path=None, force_continuous=[], feature_types=None, bad_frames=[], static_features=None, reconstruct_spikes='wavelet', save_to_pickle=False, verbose=True, router_source=None)[source]

Load or create an Experiment object with automatic caching and cloud support.

This function provides a high-level interface for loading experiments with smart caching, automatic cloud data download (for IABS data), and pickle serialization. It first checks for cached experiments, then loads from local data files, and finally downloads from cloud storage if needed.

Parameters:

data_source (str) – Data source identifier. ‘IABS’ enables automatic cloud download. Other sources (e.g., ‘MyLab’) require data_path parameter pointing to a local NPZ file.
exp_params (dict) – Experiment parameters dictionary. See load_exp_from_aligned_data for required fields based on data_source.
force_rebuild (bool, default=False) – If True, rebuild experiment from data files even if pickle cache exists. The existing pickle is ignored completely.
force_reload (bool, default=False) – If True, re-download data from cloud even if local files exist. Also bypasses pickle cache (similar to force_rebuild).
via_pydrive (bool, default=True) – Use PyDrive for Google Drive access. If False, uses alternative method.
gauth (GoogleAuth object, optional) – Pre-authenticated GoogleAuth object for Drive access. If None, will create new authentication.
root (str, default='DRIADA data') – Root directory for storing experiments and data.
exp_path (str, optional) – Custom path for experiment pickle file. If None, uses standard naming: {root}/{expname}/Exp {expname}.pickle
data_path (str, optional) – Path to NPZ data file. Required for non-IABS data sources. For IABS, if None, uses standard naming: {root}/{expname}/Aligned data/{expname} syn data.npz
force_continuous (list, optional) – Deprecated. See load_exp_from_aligned_data.
feature_types (dict[str, str], optional) – Feature type overrides. See load_exp_from_aligned_data.
bad_frames (list, optional) – Frame indices to mark as bad. See load_exp_from_aligned_data.
static_features (dict, optional) – Static experimental parameters. See load_exp_from_aligned_data.
reconstruct_spikes (str or bool, default='wavelet') – Spike reconstruction method. See load_exp_from_aligned_data.
save_to_pickle (bool, default=False) – Whether to save the experiment to pickle after creation.
verbose (bool, default=True) – Print progress messages.
router_source (str, pandas.DataFrame, or None, optional) – Source of the router data for IABS experiments: - None: Downloads from URL in config.py (default behavior) - str: Direct Google Sheets export URL - pandas.DataFrame: Pre-loaded router DataFrame Only used when data_source=’IABS’ and downloading from cloud.

Returns:

expExperiment: The loaded or created Experiment object.
load_loglist or None: Cloud download log if data was downloaded, None otherwise. Always None for local loads or pickle loads.

Return type:

tuple

Raises:

ValueError – If root exists but is not a directory. If data_source is not ‘IABS’ and no data_path provided.
FileNotFoundError – If data file not found and cannot be downloaded.
Side Effects –
------------ –
- Creates root directory if it doesn't exist –
- Creates experiment subdirectory structure –
- Downloads data from cloud for IABS source (if needed) –
- Saves pickle file if save_to_pickle=True and building from data –
- Prints progress messages if verbose=True –

Notes

Loading priority: 1. If pickle exists and not force_rebuild/reload: load from pickle 2. If local data exists and not force_reload: load from data file 3. If IABS source: attempt cloud download 4. Otherwise: raise error

For IABS data, expects cloud structure with ‘Aligned data’ containing npz files with calcium and behavioral data.

The function returns a tuple (exp, load_log) to maintain backward compatibility, even though load_log is often None.

Examples

>>> # Load IABS data with custom router URL
>>> url = "https://docs.google.com/spreadsheets/d/.../export?format=xlsx"
>>> exp, _ = load_experiment(  
...     'IABS',
...     {'track': 'linear', 'animal_id': 'CA1_01', 'session': '1'},
...     router_source=url,
...     verbose=False
... )

>>> # Load external lab data from NPZ file
>>> import tempfile
>>> import numpy as np
>>>
>>> # Create test data file
>>> with tempfile.NamedTemporaryFile(delete=False, suffix='.npz') as f:
...     temp_data = f.name
>>> test_data = {
...     'calcium': np.random.rand(30, 500),
...     'position': np.random.rand(500) * 100
... }
>>> np.savez(temp_data, **test_data)
>>>
>>> # Load from local file
>>> exp, _ = load_experiment(
...     'MyLab',
...     {'name': 'test_exp'},
...     data_path=temp_data,
...     verbose=False
... )
>>> exp.signature
'Exp test_exp'
>>> exp.n_cells
30
>>>
>>> # Force rebuild even if pickle exists
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     exp2, _ = load_experiment(
...         'MyLab',
...         {'name': 'rebuild_test'},
...         data_path=temp_data,
...         root=tmpdir,
...         force_rebuild=True,
...         save_to_pickle=True,
...         verbose=False
...     )
>>> exp2.n_cells
30
>>>
>>> # Cleanup
>>> import os
>>> os.unlink(temp_data)

driada.experiment.exp_build.load_exp_from_aligned_data(data_source, exp_params, data, force_continuous=[], feature_types=None, bad_frames=[], static_features=None, verbose=True, reconstruct_spikes=None, aggregate_features=None, n_jobs=-1, enable_parallelization=True, create_circular_2d=True)[source]

Create an Experiment object from aligned neural and behavioral data.

Constructs an Experiment instance from pre-aligned calcium imaging data and behavioral variables, automatically determining feature types and filtering out constant or invalid features.

Parameters:

data_source (str) – Identifier for the data source (e.g., ‘IABS’, ‘custom’). Used with exp_params to construct the experiment name.
exp_params (dict) –
Experiment parameters dictionary. For IABS data source, requires:
- ’track’: experimental paradigm (e.g., ‘linear_track’)
- ’animal_id’: subject identifier
- ’session’: session identifier
For other sources, can contain any metadata for experiment naming.
data (dict) –
Dictionary containing aligned data with keys:
- ’calcium’ or ‘Calcium’: 2D array of calcium signals (neurons x time)
- ’spikes’ or ‘Spikes’: 2D array of spike data (optional)
- Other keys: behavioral variables as 1D or 2D arrays. 1D arrays (time,) are treated as single time series; 2D arrays (components, time) are treated as MultiTimeSeries.
force_continuous (list, optional) – Deprecated. Use feature_types instead. List of feature names to force as continuous. Converted to feature_types={f: 'continuous'} internally if feature_types is not provided.
feature_types (dict[str, str], optional) – Map of feature names to type strings, overriding auto-detection. See TimeSeries._create_type_from_string for valid strings. Also acts as a circular whitelist: unlisted auto-circular features are overridden to linear.
bad_frames (list, optional) – List of frame indices to mark as bad/invalid. These frames will be masked in the resulting Experiment object. Useful for removing motion artifacts or recording gaps.
static_features (dict, optional) –
Static experimental parameters. Common keys:
- ’t_rise_sec’: calcium rise time (default: 0.25)
- ’t_off_sec’: calcium decay time (default: 2.0)
- ’fps’: frame rate in Hz (default: 20.0)
- Any other experiment-specific constants
verbose (bool, default=True) – Whether to print progress and feature information.
reconstruct_spikes (str, bool, or None, default=None) –
DEPRECATED: This parameter is deprecated. Load the experiment first, then call exp.reconstruct_all_neurons() separately for better control.

If provided (for backward compatibility): - ‘wavelet’: wavelet-based detection (old batch method) - False/None: no reconstruction (recommended)

New workflow (recommended): >>> exp = load_exp_from_aligned_data(data_source, exp_params, data) >>> exp.reconstruct_all_neurons(method=’wavelet’, n_iter=3)
aggregate_features (dict, optional) –
Dictionary mapping tuples of feature keys to combined names. Allows pre-specifying which features should be combined into MultiTimeSeries before Experiment building. This is useful for deterministic data hash generation.

Format: {(key1, key2, …): “combined_name”, …}

Example: >>> aggregate_features = { … (“x”, “y”): “position”, # Combine x, y into 2D MultiTimeSeries … (“speed”, “direction”): “velocity”, … }

The component features remain available as individual features in addition to the combined MultiTimeSeries.
n_jobs (int, default=-1) – Number of parallel jobs for neuron construction and other parallel operations. Use -1 for all available cores, 1 to disable parallelization.
enable_parallelization (bool, default=True) – Enable parallel processing for neuron construction and hash computation. Set to False to use sequential processing (useful for debugging).
create_circular_2d (bool, default=True) – If True, automatically create _2d versions of circular features (detected via type_info.is_circular) as (cos, sin) MultiTimeSeries. Original features are preserved. E.g., ‘headdirection’ -> also creates ‘headdirection_2d’. This improves MI estimation accuracy for circular variables like head direction.

Returns:

Initialized Experiment object with processed data.

Return type:

Experiment

Raises:

TypeError – If data or exp_params are not dictionaries.
ValueError – If data is empty or calcium data is missing.
Side Effects –
------------ –
- Prints feature information if verbose=True –
- Creates deep copy of input data –

Notes

Features with ≤1 unique non-NaN values are filtered as “garbage”
Feature types (discrete/continuous) are automatically determined by checking if values appear to be categorical (few unique values) or continuous
Case-insensitive key matching for ‘calcium’ and ‘spikes’
Creates a deep copy of input data to avoid modifying the original
The experiment name is constructed using construct_session_name()
Bad frames create a boolean mask; indices beyond data length are ignored
Scalar values (0D arrays) are ignored with a warning - use static_features instead
Non-numeric features (strings, objects) are ignored with a warning
2D arrays are automatically converted to MultiTimeSeries objects

Examples

>>> # Basic usage with minimal data
>>> np.random.seed(42)  # For reproducibility
>>> data = {
...     'calcium': np.random.rand(50, 1000),  # 50 neurons, 1000 frames
...     'position': np.linspace(0, 100, 1000),  # Linear track position
...     'speed': np.random.rand(1000) * 10,  # Random speeds
...     'trial_type': np.repeat([0, 1, 0, 1], 250)  # Discrete variable
... }
>>> exp_params = {
...     'track': 'linear_track',
...     'animal_id': 'mouse01',
...     'session': 'day1'
... }
>>> exp = load_exp_from_aligned_data('IABS', exp_params, data, verbose=False)
>>> exp.signature
'Exp linear_track_mouse01_day1'
>>> sorted(exp.dynamic_features.keys())
['position', 'speed', 'trial_type']

>>> # Force discrete variable to be continuous
>>> exp2 = load_exp_from_aligned_data(
...     'IABS', exp_params, data,
...     force_continuous=['trial_type'],
...     bad_frames=[10, 11, 12],  # Mark frames as bad
...     static_features={'fps': 30.0},  # Override default fps
...     verbose=False
... )
>>> exp2.static_features['fps']
30.0
>>> exp2.dynamic_features['trial_type'].discrete  # Should be False due to force_continuous
False

driada.experiment.exp_build.save_exp_to_pickle(exp, path, verbose=True)[source]

Save an Experiment object to a pickle file.

Parameters:

exp (Experiment) – The Experiment object to save.
path (str) – File path where the pickle will be saved.
verbose (bool, default=True) – Whether to print save confirmation.

Raises:

PermissionError – If no write permission for the path.
OSError – If path is invalid or other OS-related errors.

Examples

>>> # Create a test experiment
>>> import tempfile
>>> import os
>>> from driada.experiment import load_demo_experiment
>>> exp = load_demo_experiment(verbose=False)
>>>
>>> # Save experiment to temporary file
>>> with tempfile.NamedTemporaryFile(delete=False, suffix='.pkl') as f:
...     temp_path = f.name
>>> save_exp_to_pickle(exp, temp_path)  
Experiment Exp demo saved to ...

>>> # Save without verbose output
>>> save_exp_to_pickle(exp, temp_path, verbose=False)
>>>
>>> # Cleanup
>>> os.unlink(temp_path)

Notes

Uses Python’s pickle module with default protocol. Creates parent directories if they don’t exist.

driada.experiment.exp_build.load_exp_from_pickle(path, verbose=True)[source]

Load an Experiment object from a pickle file.

Parameters:

path (str) – Path to the pickle file.
verbose (bool, default=True) – Whether to print load confirmation.

Returns:

The loaded Experiment object.

Return type:

Experiment

Raises:

FileNotFoundError – If the pickle file doesn’t exist.
PermissionError – If no read permission for the file.
OSError – If path is invalid or other OS-related errors.

Examples

>>> # Create and save a test experiment first
>>> import tempfile
>>> from driada.experiment import load_demo_experiment
>>> test_exp = load_demo_experiment(verbose=False)
>>> with tempfile.NamedTemporaryFile(delete=False, suffix='.pkl') as f:
...     temp_path = f.name
>>> save_exp_to_pickle(test_exp, temp_path, verbose=False)
>>>
>>> # Load experiment from file
>>> exp = load_exp_from_pickle(temp_path)  
Experiment Exp demo loaded from ...

>>> # Load without verbose output
>>> exp = load_exp_from_pickle(temp_path, verbose=False)
>>> exp.signature
'Exp demo'
>>>
>>> # Cleanup
>>> import os
>>> os.unlink(temp_path)

Notes

Uses Python’s pickle module for deserialization. Prints experiment signature upon successful load if verbose=True.

Usage Examples

Loading from IABS Data

from driada.experiment import load_experiment

# Define experiment parameters for IABS data
exp_params = {
    'track': 'STFP',
    'animal_id': 'M123',
    'session': '1'
}

# Load from Google Drive (IABS data)
# Note: Requires config.py with IABS_ROUTER_URL set
# exp_gdrive = load_experiment('IABS', exp_params, via_pydrive=True)

Loading Pre-aligned Data

from driada.experiment import load_exp_from_aligned_data
import numpy as np

# Prepare aligned data dictionary
# Note: Dynamic features should be 1D arrays
data = {
    'calcium': np.random.randn(50, 10000),  # 50 neurons, 10000 timepoints
    'position_x': np.random.randn(10000),   # x coordinates
    'position_y': np.random.randn(10000),   # y coordinates
    'velocity_x': np.random.randn(10000),   # x velocity
    'velocity_y': np.random.randn(10000),   # y velocity
}

# Create experiment from IABS-style data
exp = load_exp_from_aligned_data(
    data_source='IABS',
    exp_params={
        'track': 'STFP',
        'animal_id': 'M001',
        'session': '1'
    },
    data=data,
    static_features={'fps': 30.0}  # Pass fps as static feature
)

Loading from Generic Lab Data

from driada.experiment import load_experiment

# Load from NPZ file for non-IABS labs
# Using example data file from the repository
exp, _ = load_experiment(
    'MyLab',
    {'name': 'spatial_navigation_task'},
    data_path='examples/example_data/sample_recording.npz',
    reconstruct_spikes=False,  # Disable for speed
    verbose=False
)

# Load with custom naming based on parameters
exp2, _ = load_experiment(
    'NeuroLab',
    {'subject': 'rat42', 'session': 'day3', 'experiment': 'maze'},
    data_path='examples/example_data/sample_recording.npz',
    reconstruct_spikes=False,  # Disable for speed
    save_to_pickle=False,
    verbose=False
)

# NPZ files automatically handle multidimensional features
# 2D arrays become MultiTimeSeries objects
# Scalar and non-numeric values are ignored with warnings

Handling Multidimensional Features

# 2D features (e.g., position) are automatically handled
data = {
    'calcium': np.random.randn(50, 10000),
    'position': np.random.randn(2, 10000),  # 2D trajectory -> MultiTimeSeries
    'speed': np.random.randn(10000),        # 1D -> TimeSeries
}

# Create experiment - 2D arrays automatically become MultiTimeSeries
exp = load_exp_from_aligned_data(
    data_source='MyLab',
    exp_params={'name': 'test_exp'},
    data=data
)

# Access multidimensional features
print(exp.position.n_dim)  # 2
print(exp.position.data.shape)  # (2, 10000)

Saving and Loading Experiments

from driada.experiment import save_exp_to_pickle, load_exp_from_pickle

# First create/load an experiment
import numpy as np
from driada.experiment import load_exp_from_aligned_data
data = {
    'calcium': np.random.rand(10, 1000),
    'position': np.random.rand(1000)
}
exp = load_exp_from_aligned_data('MyLab', {'name': 'test'}, data, verbose=False)

# Save experiment
save_exp_to_pickle(exp, 'processed_experiment.pkl')

# Load later
exp_loaded = load_exp_from_pickle('processed_experiment.pkl')

# Verify data integrity
assert np.allclose(exp.calcium.data, exp_loaded.calcium.data)

MATLAB File Format

Expected structure for MATLAB files:

% Required fields
data.calcium = [n_neurons x n_timepoints];  % Calcium imaging data
data.fps = 30;                              % Sampling rate

% Optional fields
data.spikes = [n_neurons x n_timepoints];   % Spike data
data.behavior.position = [2 x n_timepoints]; % Position data
data.behavior.velocity = [2 x n_timepoints]; % Velocity data
data.info.mouse_id = 'M001';                % Metadata

Custom Loaders

# For custom formats, use load_exp_from_aligned_data
def load_custom_format(filename):
    # Load your custom format
    data = custom_loader(filename)

    # Extract components
    calcium = data['neural_activity']
    behavior = {
        'stimulus': data['stimulus_trace'],
        'response': data['behavioral_response']
    }

    # Create experiment
    return load_exp_from_aligned_data(
        calcium=calcium,
        behavior=behavior,
        fps=data['sampling_rate']
    )

Error Handling

# Missing data_path for non-IABS sources
try:
    exp, _ = load_experiment('MyLab', {'name': 'test'})
except ValueError as e:
    print(e)  # "For data source 'MyLab', you must provide the 'data_path' parameter"

# Missing calcium data
try:
    data = {'position': np.random.randn(1000)}  # No calcium!
    exp = load_exp_from_aligned_data('MyLab', {}, data)
except ValueError as e:
    print(e)  # "No calcium data found!"

# Warnings for invalid data types
data = {
    'calcium': np.random.randn(50, 1000),
    'fps': 30.0,  # Scalar - will be ignored with warning
    'labels': np.array(['A', 'B', 'C'] * 333 + ['A'])  # Non-numeric - ignored
}
exp = load_exp_from_aligned_data('MyLab', {}, data)
# Warning: Ignoring scalar value 'fps' found in NPZ file
# Warning: Ignoring non-numeric feature 'labels' with dtype <U1

Best Practices

Data Organization: Keep calcium and behavior data time-aligned
Metadata: Include as much metadata as possible in exp_params
File Formats: Use NPZ for raw data, pickle for processed experiments
Compression: Pickle files are compressed by default
Large Files: Consider HDF5 for very large datasets
Static Features: Pass constants like fps via static_features parameter
Multidimensional Data: Store as 2D arrays (components x time) in NPZ files