Utility Functions
This module provides utility functions for Google Drive operations.
Classes
- class driada.gdrive.gdrive_utils.GoogleDriveFile(id, name, type, children=None)[source]
Represent Google Drive file objects structure.
- children
If it is a directory, it contains the folder files/directories
- Type:
List[GoogleDriveFile]
- __init__(id, name, type, children=None)[source]
Initialize GoogleDriveFile instance.
- Parameters:
id (str) – Unique file or folder ID from Google Drive.
name (str) – Display name of the file or folder.
type (str) – MIME type of the file, or ‘application/vnd.google-apps.folder’ for folders.
children (List[GoogleDriveFile], optional) – Child items if this is a folder. Default is empty list.
Functions
- driada.gdrive.parse_google_drive_file(folder, content, use_cookies=True)[source]
Extract information about the current page file and its children.
- Parameters:
folder (str) – URL of the Google Drive folder. Must be of the format ‘https://drive.google.com/drive/folders/{id}’.
content (str) – Google Drive’s raw HTML content.
use_cookies (bool, optional) – Whether to clear cookies. Default is True.
- Returns:
gdrive_file (GoogleDriveFile) – Current GoogleDriveFile object with empty children list.
id_name_type_iter (list) – List of tuples (id, name, type) for each child item.
- Raises:
RuntimeError – If folder information cannot be extracted from HTML.
Notes
Parses JavaScript data embedded in Google Drive HTML. Expects specific HTML structure and may break with Google Drive updates.
- driada.gdrive.id_from_link(link)[source]
Extract the file or folder ID from a Google Drive URL.
- Parameters:
link (str) – Google Drive URL containing the file or folder ID. Can be in format: - https://drive.google.com/drive/folders/{id} - https://drive.google.com/file/d/{id}/view - https://drive.google.com/open?id={id}
- Returns:
The extracted file or folder ID.
- Return type:
- Raises:
ValueError – If the link doesn’t contain ‘http’.
Examples
>>> id_from_link('https://drive.google.com/drive/folders/1a2b3c4d5e') '1a2b3c4d5e' >>> id_from_link('https://drive.google.com/open?id=xyz123') 'xyz123'
Notes
Does not validate the extracted ID format. May return empty string or invalid IDs for malformed URLs.
- driada.gdrive.download_and_parse_google_drive_link(folder, quiet=False, use_cookies=True, remaining_ok=False, name_part='')[source]
Get folder structure of Google Drive folder URL.
- Parameters:
folder (str) – URL of the Google Drive folder. Must be of the format ‘https://drive.google.com/drive/folders/{id}’.
quiet (bool, optional) – Suppress terminal output. Default is False.
use_cookies (bool, optional) – Flag to use cookies. Default is True.
remaining_ok (bool, optional) – Allow processing if folder has ≥50 files (API limit). Default is False.
name_part (str, optional) – Filter items by name substring. Default is empty string (no filter).
- Returns:
return_code (bool) – True if successful, False if failed (network error, permissions, etc.).
gdrive_file (GoogleDriveFile or None) – Folder structure with nested children, or None if failed.
- Raises:
RuntimeError – If folder has ≥50 files and remaining_ok is False.
Notes
Recursively processes subfolders. Limited to 50 items per folder due to Google Drive API restrictions.
Usage Examples
File ID Extraction
from driada.gdrive import id_from_link
# Extract ID from various URL formats
urls = [
'https://drive.google.com/file/d/1abc123.../view',
'https://drive.google.com/open?id=1abc123...',
'https://drive.google.com/drive/folders/1xyz789...',
'https://docs.google.com/document/d/1doc456.../edit'
]
for url in urls:
file_id = id_from_link(url)
print(f"ID: {file_id}")
Download and Parse Links
from driada.gdrive import download_and_parse_google_drive_link
# Download and parse a Google Drive folder page
folder_url = 'https://drive.google.com/drive/folders/1xyz789...'
# Get folder info and its contents
success, folder_file = download_and_parse_google_drive_link(
folder_url,
quiet=True, # Suppress output
name_part="exp" # Filter by name
)
if success and folder_file:
print(f"Folder: {folder_file['name']}")
print(f"Contains {len(folder_file.get('children', []))} matching files")
# Process children
for child in folder_file.get('children', []):
print(f"- {child['name']} ({child['id']})")
print(f" Type: {child['type']}")
GoogleDriveFile Class
from driada.gdrive import GoogleDriveFile
# GoogleDriveFile is a simple data class
gdfile = GoogleDriveFile(
id='1abc123...',
name='experiment_data.mat',
type='file'
)
# Access attributes
print(f"File: {gdfile.name}")
print(f"ID: {gdfile.id}")
print(f"Type: {gdfile.type}")
# Use with download functions
from driada.gdrive import download_gdrive_data
download_gdrive_data(auth, gdfile, 'local_path/')
Parse Google Drive HTML
from driada.gdrive import parse_google_drive_file
import requests
# This is a low-level function used internally
# parse_google_drive_file requires HTML content
folder_url = 'https://drive.google.com/drive/folders/1xyz789...'
# Get the raw HTML content first
response = requests.get(folder_url)
# Parse the content
folder_file, children = parse_google_drive_file(
folder_url,
response.text,
use_cookies=True
)
Integration with Other Functions
These utilities work seamlessly with download functions:
from driada.gdrive import (
id_from_link,
download_gdrive_data,
desktop_auth
)
# Authenticate
auth = desktop_auth('path/to/client_secrets.json')
# Extract ID from URL
url = 'https://drive.google.com/file/d/1abc123.../view?usp=sharing'
file_id = id_from_link(url)
# Download using extracted ID
download_gdrive_data(auth, file_id, 'local_data.mat')