Download Functions

This module provides functions for downloading files and folders from Google Drive.

Functions

driada.gdrive.download.download_gdrive_data(data_router, expname, whitelist=['Timing.xlsx'], via_pydrive=False, data_pieces=None, tdir='DRIADA data', gauth=None)[source]

Download experimental data from Google Drive based on a data router table or direct link.

Uses a data router DataFrame to locate and download experimental data files from Google Drive folders specified for each experiment. Alternatively, can accept a direct Google Drive share link to download from a single folder.

Parameters:

data_router (pandas.DataFrame or str) – Either a DataFrame containing experiment names and corresponding Google Drive links for different data types (must have an ‘Experiment’ column), or a string containing a direct Google Drive share link to download from.
expname (str) – Name of the experiment to download data for. Must match an entry in the ‘Experiment’ column of data_router if data_router is a DataFrame. Used as folder name and filename filter if data_router is a share link.
whitelist (list of str, optional) – List of file names to always download regardless of naming patterns. Default is [‘Timing.xlsx’].
via_pydrive (bool, optional) – If True, use PyDrive2 for downloading (requires authentication). If False, use gdown. Default is False.
data_pieces (list of str or None, optional) – List of data types (column names) to download. If None, downloads all available data types except certain excluded ones. Default is None. Ignored when data_router is a share link.
tdir (str, optional) – Target directory name for downloaded data. Default is ‘DRIADA data’.
gauth (GoogleAuth object or None, optional) – PyDrive2 authentication object. Required if via_pydrive=True. Default is None.

Returns:

success (bool) – True if at least one file was successfully downloaded, False otherwise.
load_log (list) – Captured output log from the download process.

Raises:

ValueError – If data_router is not a DataFrame or string. If data_router is a DataFrame but lacks required ‘Experiment’ column. If via_pydrive=True but gauth is None.

Notes

When data_router is a DataFrame:: The function creates a directory structure: tdir/expname/data_type/ for organizing downloaded files. Data types excluded by default are: ‘Experiment’, ‘Description’, ‘Video’, ‘Aligned data’, ‘Computation results’.
When data_router is a share link:: The function creates a directory structure: tdir/expname/ and downloads all files matching the expname filter.

Empty directories are automatically removed after download attempts.

Examples

>>> # Using DataFrame router
>>> success, log = download_gdrive_data(  
...     data_router=router_df,
...     expname='exp001'
... )

>>> # Using direct share link
>>> success, log = download_gdrive_data(  
...     data_router='https://drive.google.com/drive/folders/...',
...     expname='exp001'
... )

driada.gdrive.download_part_of_folder(output, folder, key='', antikey=None, whitelist=[], extensions=['.csv', '.xlsx', '.npz'], via_pydrive=False, gauth=None, maxfiles=None)[source]

Download specific files from a Google Drive folder based on filtering criteria.

Downloads files from a Google Drive folder that match specific name patterns and file extensions. Supports both gdown (no authentication) and PyDrive2 (requires authentication) methods.

Parameters:

output (str) – Local directory path where files will be downloaded. Directory will be created if it doesn’t exist.
folder (str) – Google Drive folder share link. Must be a valid Google Drive URL.
key (str, optional) – Substring that must be present in file names to be downloaded. Default is empty string (matches all).
antikey (str or None, optional) – Substring that, if present in file names, will exclude them from download. Default is None.
whitelist (list of str, optional) – List of exact file names to download regardless of other criteria. Default is empty list.
extensions (list of str, optional) – List of allowed file extensions. Default is [‘.csv’, ‘.xlsx’, ‘.npz’].
via_pydrive (bool, optional) – If True, use PyDrive2 (requires authentication but supports more files). If False, use gdown (no auth but limited). Default is False.
gauth (GoogleAuth object or None, optional) – PyDrive2 authentication object. Required if via_pydrive=True. Default is None.
maxfiles (int or None, optional) – Maximum number of files to download. Default is None (no limit).

Returns:

return_code (bool) – True if download completed successfully, False otherwise.
rel (list of tuple) – List of (file_id, file_name) tuples for downloaded files.
load_log (list) – Captured output log from the download process.

Raises:

ValueError – If via_pydrive=True but gauth is None.
FileNotFoundError – If download fails when not using PyDrive2.
OSError – If unable to create output directory or write files.

Notes

When using PyDrive, all filtering parameters (antikey, whitelist, extensions) are applied consistently with the gdown path.

Examples

>>> # Download CSV files containing 'experiment' in name
>>> success, files, log = download_part_of_folder(  
...     output='./data',
...     folder='https://drive.google.com/drive/folders/...',
...     key='experiment',
...     extensions=['.csv']
... )

driada.gdrive.initialize_iabs_router(root='/content', router_source=None)[source]

Initialize the IABS data router from Google Sheets, URL, or DataFrame.

Initializes the IABS (Institute for Advanced Brain Studies) data router from various sources: config file URL, direct Google Sheets URL, or pre-loaded DataFrame.

Parameters:

root (str, optional) – Root directory where the router file will be saved (if downloading). Default is ‘/content’ (typically for Google Colab).
router_source (str, pandas.DataFrame, or None, optional) – Source of the router data: - None: Downloads from URL in config.py (default behavior) - str: Direct Google Sheets export URL (e.g., ‘https://docs.google.com/…/export?format=xlsx’) - pandas.DataFrame: Pre-loaded router DataFrame with experiment data Default is None.

Returns:

data_router (pandas.DataFrame) – DataFrame containing experiment information and Google Drive links. Columns include experiment names and various data type links.
data_pieces (list of str) – List of data type column names that can be downloaded, excluding metadata columns.

Raises:

ImportError – If config.py not found or IABS_ROUTER_URL not defined in config.
requests.RequestException – If download from Google Sheets fails.
pd.errors.ParserError – If the downloaded file cannot be parsed as Excel.
OSError – If unable to create directory or write file.

Notes

Requires a config.py file with IABS_ROUTER_URL defined. See config_template.py for the required format.

WARNING: This function removes any existing router file before downloading the latest version. No backup is created.

Empty cells in the DataFrame are forward-filled to handle merged cells.

The following columns are excluded from data_pieces as they contain metadata rather than downloadable data: - ‘Experiment’ - ‘Description’ - ‘Video’ - ‘Aligned data’ - ‘Computation results’

Examples

>>> # Using config file URL (default)
>>> router, pieces = initialize_iabs_router()  

>>> # Using direct Google Sheets URL
>>> url = "https://docs.google.com/spreadsheets/d/.../export?format=xlsx"
>>> router, pieces = initialize_iabs_router(router_source=url)  

>>> # Using pre-loaded DataFrame
>>> df = pd.read_excel("my_router.xlsx")  
>>> router, pieces = initialize_iabs_router(router_source=df)  

Usage Examples

Basic File Download

from driada.gdrive import desktop_auth, download_gdrive_data

# Authenticate
auth = desktop_auth('path/to/client_secrets.json')

# Download a single file
file_url = 'https://drive.google.com/file/d/1abc123.../view'
download_gdrive_data(auth, file_url, 'local_data/experiment.mat')

# Download from file ID
file_id = '1abc123...'
download_gdrive_data(auth, file_id, 'local_data/experiment.mat')

Folder Download

# Download entire folder
from driada.gdrive import desktop_auth, download_gdrive_data

auth = desktop_auth('path/to/client_secrets.json')
folder_url = 'https://drive.google.com/drive/folders/1xyz789...'
success, log = download_gdrive_data(
    auth,
    folder_url,
    'local_data/experiment_folder/',
    recursive=True  # Include subfolders
)

Selective Download

from driada.gdrive import download_part_of_folder

# Download only specific files from folder
download_part_of_folder(
    output='local_data/',
    folder='https://drive.google.com/drive/folders/1xyz789...',
    key='experiment',  # Files containing 'experiment'
    extensions=['.mat', '.npz'],  # Only these file types
    via_pydrive=True,
    gauth=auth,
    maxfiles=10
)

IABS Router Setup

from driada.gdrive import initialize_iabs_router

# IMPORTANT: Requires config.py setup first:
# 1. Copy src/driada/gdrive/config_template.py to config.py
# 2. Set IABS_ROUTER_URL to your Google Sheets export URL
# 3. Add config.py to .gitignore

# Download and initialize IABS data router
router_df, data_pieces = initialize_iabs_router(root='/content')

# router_df contains experiment metadata with Drive links
# data_pieces lists downloadable data columns

Error Handling

Basic error handling example:

import time

def download_with_retry(auth, file_id, local_path, max_retries=3):
    for attempt in range(max_retries):
        try:
            download_gdrive_data(auth, file_id, local_path)
            return True
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
    return False

Common Use Cases

Downloading Experimental Data:

# Standard workflow for DRIADA data
from driada.gdrive import desktop_auth, download_gdrive_data

auth = desktop_auth('path/to/client_secrets.json')

# Download experiment folder
exp_folder = 'https://drive.google.com/drive/folders/...'
success, log = download_gdrive_data(
    auth,
    exp_folder,
    'experiments/mouse1_day1/',
    recursive=True
)

# Load the experiment (requires exp_params)
from driada.experiment import load_experiment
exp_params = {'mouse': 'mouse1', 'date': 'day1', 'session': 1}
exp = load_experiment('local', exp_params, data_path='experiments/mouse1_day1/data.mat')