Download Functions
This module provides functions for downloading files and folders from Google Drive.
Functions
- driada.gdrive.download.download_gdrive_data(data_router, expname, whitelist=['Timing.xlsx'], via_pydrive=False, data_pieces=None, tdir='DRIADA data', gauth=None)[source]
Download experimental data from Google Drive based on a data router table or direct link.
Uses a data router DataFrame to locate and download experimental data files from Google Drive folders specified for each experiment. Alternatively, can accept a direct Google Drive share link to download from a single folder.
- Parameters:
data_router (pandas.DataFrame or str) – Either a DataFrame containing experiment names and corresponding Google Drive links for different data types (must have an ‘Experiment’ column), or a string containing a direct Google Drive share link to download from.
expname (str) – Name of the experiment to download data for. Must match an entry in the ‘Experiment’ column of data_router if data_router is a DataFrame. Used as folder name and filename filter if data_router is a share link.
whitelist (list of str, optional) – List of file names to always download regardless of naming patterns. Default is [‘Timing.xlsx’].
via_pydrive (bool, optional) – If True, use PyDrive2 for downloading (requires authentication). If False, use gdown. Default is False.
data_pieces (list of str or None, optional) – List of data types (column names) to download. If None, downloads all available data types except certain excluded ones. Default is None. Ignored when data_router is a share link.
tdir (str, optional) – Target directory name for downloaded data. Default is ‘DRIADA data’.
gauth (GoogleAuth object or None, optional) – PyDrive2 authentication object. Required if via_pydrive=True. Default is None.
- Returns:
success (bool) – True if at least one file was successfully downloaded, False otherwise.
load_log (list) – Captured output log from the download process.
- Raises:
ValueError – If data_router is not a DataFrame or string. If data_router is a DataFrame but lacks required ‘Experiment’ column. If via_pydrive=True but gauth is None.
Notes
- When data_router is a DataFrame:
The function creates a directory structure: tdir/expname/data_type/ for organizing downloaded files. Data types excluded by default are: ‘Experiment’, ‘Description’, ‘Video’, ‘Aligned data’, ‘Computation results’.
- When data_router is a share link:
The function creates a directory structure: tdir/expname/ and downloads all files matching the expname filter.
Empty directories are automatically removed after download attempts.
Examples
>>> # Using DataFrame router >>> success, log = download_gdrive_data( ... data_router=router_df, ... expname='exp001' ... )
>>> # Using direct share link >>> success, log = download_gdrive_data( ... data_router='https://drive.google.com/drive/folders/...', ... expname='exp001' ... )
- driada.gdrive.download_part_of_folder(output, folder, key='', antikey=None, whitelist=[], extensions=['.csv', '.xlsx', '.npz'], via_pydrive=False, gauth=None, maxfiles=None)[source]
Download specific files from a Google Drive folder based on filtering criteria.
Downloads files from a Google Drive folder that match specific name patterns and file extensions. Supports both gdown (no authentication) and PyDrive2 (requires authentication) methods.
- Parameters:
output (str) – Local directory path where files will be downloaded. Directory will be created if it doesn’t exist.
folder (str) – Google Drive folder share link. Must be a valid Google Drive URL.
key (str, optional) – Substring that must be present in file names to be downloaded. Default is empty string (matches all).
antikey (str or None, optional) – Substring that, if present in file names, will exclude them from download. Default is None.
whitelist (list of str, optional) – List of exact file names to download regardless of other criteria. Default is empty list.
extensions (list of str, optional) – List of allowed file extensions. Default is [‘.csv’, ‘.xlsx’, ‘.npz’].
via_pydrive (bool, optional) – If True, use PyDrive2 (requires authentication but supports more files). If False, use gdown (no auth but limited). Default is False.
gauth (GoogleAuth object or None, optional) – PyDrive2 authentication object. Required if via_pydrive=True. Default is None.
maxfiles (int or None, optional) – Maximum number of files to download. Default is None (no limit).
- Returns:
return_code (bool) – True if download completed successfully, False otherwise.
rel (list of tuple) – List of (file_id, file_name) tuples for downloaded files.
load_log (list) – Captured output log from the download process.
- Raises:
ValueError – If via_pydrive=True but gauth is None.
FileNotFoundError – If download fails when not using PyDrive2.
OSError – If unable to create output directory or write files.
Notes
When using PyDrive, all filtering parameters (antikey, whitelist, extensions) are applied consistently with the gdown path.
Examples
>>> # Download CSV files containing 'experiment' in name >>> success, files, log = download_part_of_folder( ... output='./data', ... folder='https://drive.google.com/drive/folders/...', ... key='experiment', ... extensions=['.csv'] ... )
- driada.gdrive.initialize_iabs_router(root='/content', router_source=None)[source]
Initialize the IABS data router from Google Sheets, URL, or DataFrame.
Initializes the IABS (Institute for Advanced Brain Studies) data router from various sources: config file URL, direct Google Sheets URL, or pre-loaded DataFrame.
- Parameters:
root (str, optional) – Root directory where the router file will be saved (if downloading). Default is ‘/content’ (typically for Google Colab).
router_source (str, pandas.DataFrame, or None, optional) – Source of the router data: - None: Downloads from URL in config.py (default behavior) - str: Direct Google Sheets export URL (e.g., ‘https://docs.google.com/…/export?format=xlsx’) - pandas.DataFrame: Pre-loaded router DataFrame with experiment data Default is None.
- Returns:
data_router (pandas.DataFrame) – DataFrame containing experiment information and Google Drive links. Columns include experiment names and various data type links.
data_pieces (list of str) – List of data type column names that can be downloaded, excluding metadata columns.
- Raises:
ImportError – If config.py not found or IABS_ROUTER_URL not defined in config.
requests.RequestException – If download from Google Sheets fails.
pd.errors.ParserError – If the downloaded file cannot be parsed as Excel.
OSError – If unable to create directory or write file.
Notes
Requires a config.py file with IABS_ROUTER_URL defined. See config_template.py for the required format.
WARNING: This function removes any existing router file before downloading the latest version. No backup is created.
Empty cells in the DataFrame are forward-filled to handle merged cells.
The following columns are excluded from data_pieces as they contain metadata rather than downloadable data: - ‘Experiment’ - ‘Description’ - ‘Video’ - ‘Aligned data’ - ‘Computation results’
Examples
>>> # Using config file URL (default) >>> router, pieces = initialize_iabs_router()
>>> # Using direct Google Sheets URL >>> url = "https://docs.google.com/spreadsheets/d/.../export?format=xlsx" >>> router, pieces = initialize_iabs_router(router_source=url)
>>> # Using pre-loaded DataFrame >>> df = pd.read_excel("my_router.xlsx") >>> router, pieces = initialize_iabs_router(router_source=df)
Usage Examples
Basic File Download
from driada.gdrive import desktop_auth, download_gdrive_data
# Authenticate
auth = desktop_auth('path/to/client_secrets.json')
# Download a single file
file_url = 'https://drive.google.com/file/d/1abc123.../view'
download_gdrive_data(auth, file_url, 'local_data/experiment.mat')
# Download from file ID
file_id = '1abc123...'
download_gdrive_data(auth, file_id, 'local_data/experiment.mat')
Folder Download
# Download entire folder
from driada.gdrive import desktop_auth, download_gdrive_data
auth = desktop_auth('path/to/client_secrets.json')
folder_url = 'https://drive.google.com/drive/folders/1xyz789...'
success, log = download_gdrive_data(
auth,
folder_url,
'local_data/experiment_folder/',
recursive=True # Include subfolders
)
Selective Download
from driada.gdrive import download_part_of_folder
# Download only specific files from folder
download_part_of_folder(
output='local_data/',
folder='https://drive.google.com/drive/folders/1xyz789...',
key='experiment', # Files containing 'experiment'
extensions=['.mat', '.npz'], # Only these file types
via_pydrive=True,
gauth=auth,
maxfiles=10
)
IABS Router Setup
from driada.gdrive import initialize_iabs_router
# IMPORTANT: Requires config.py setup first:
# 1. Copy src/driada/gdrive/config_template.py to config.py
# 2. Set IABS_ROUTER_URL to your Google Sheets export URL
# 3. Add config.py to .gitignore
# Download and initialize IABS data router
router_df, data_pieces = initialize_iabs_router(root='/content')
# router_df contains experiment metadata with Drive links
# data_pieces lists downloadable data columns
Error Handling
Basic error handling example:
import time
def download_with_retry(auth, file_id, local_path, max_retries=3):
for attempt in range(max_retries):
try:
download_gdrive_data(auth, file_id, local_path)
return True
except Exception as e:
print(f"Attempt {attempt + 1} failed: {e}")
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
return False
Common Use Cases
Downloading Experimental Data:
# Standard workflow for DRIADA data
from driada.gdrive import desktop_auth, download_gdrive_data
auth = desktop_auth('path/to/client_secrets.json')
# Download experiment folder
exp_folder = 'https://drive.google.com/drive/folders/...'
success, log = download_gdrive_data(
auth,
exp_folder,
'experiments/mouse1_day1/',
recursive=True
)
# Load the experiment (requires exp_params)
from driada.experiment import load_experiment
exp_params = {'mouse': 'mouse1', 'date': 'day1', 'session': 1}
exp = load_experiment('local', exp_params, data_path='experiments/mouse1_day1/data.mat')