Utility Functions

This module provides utility functions for Google Drive operations.

Classes

class driada.gdrive.gdrive_utils.GoogleDriveFile(id, name, type, children=None)[source]

Represent Google Drive file objects structure.

id

Unique id, used to build the download URL.

Type:: str

name

Actual name, used as file name.

Type:: str

type

MIME type, or application/vnd.google-apps.folder if it is a folder

Type:: str

children

If it is a directory, it contains the folder files/directories

Type:: List[GoogleDriveFile]

__init__(id, name, type, children=None)[source]

Initialize GoogleDriveFile instance.

Parameters:

id (str) – Unique file or folder ID from Google Drive.
name (str) – Display name of the file or folder.
type (str) – MIME type of the file, or ‘application/vnd.google-apps.folder’ for folders.
children (List[GoogleDriveFile], optional) – Child items if this is a folder. Default is empty list.

is_folder()[source]

Check if the GoogleDriveFile is a folder.

Returns:: True if the file is a folder, False otherwise.
Return type:: bool

Notes

Uses the global folder_type constant for comparison.

__repr__()[source]

Return string representation of GoogleDriveFile.

Returns:: Formatted string showing all attributes including children.
Return type:: str

Notes

May produce long output if there are many children.

Functions

driada.gdrive.parse_google_drive_file(folder, content, use_cookies=True)[source]

Extract information about the current page file and its children.

Parameters:

folder (str) – URL of the Google Drive folder. Must be of the format ‘https://drive.google.com/drive/folders/{id}’.
content (str) – Google Drive’s raw HTML content.
use_cookies (bool, optional) – Whether to clear cookies. Default is True.

Returns:

gdrive_file (GoogleDriveFile) – Current GoogleDriveFile object with empty children list.
id_name_type_iter (list) – List of tuples (id, name, type) for each child item.

Raises:

RuntimeError – If folder information cannot be extracted from HTML.

Notes

Parses JavaScript data embedded in Google Drive HTML. Expects specific HTML structure and may break with Google Drive updates.

driada.gdrive.id_from_link(link)[source]

Extract the file or folder ID from a Google Drive URL.

Parameters:: link (str) – Google Drive URL containing the file or folder ID. Can be in format: - https://drive.google.com/drive/folders/{id} - https://drive.google.com/file/d/{id}/view - https://drive.google.com/open?id={id}
Returns:: The extracted file or folder ID.
Return type:: str
Raises:: ValueError – If the link doesn’t contain ‘http’.

Examples

>>> id_from_link('https://drive.google.com/drive/folders/1a2b3c4d5e')
'1a2b3c4d5e'
>>> id_from_link('https://drive.google.com/open?id=xyz123')
'xyz123'

Notes

Does not validate the extracted ID format. May return empty string or invalid IDs for malformed URLs.

driada.gdrive.download_and_parse_google_drive_link(folder, quiet=False, use_cookies=True, remaining_ok=False, name_part='')[source]

Get folder structure of Google Drive folder URL.

Parameters:

folder (str) – URL of the Google Drive folder. Must be of the format ‘https://drive.google.com/drive/folders/{id}’.
quiet (bool, optional) – Suppress terminal output. Default is False.
use_cookies (bool, optional) – Flag to use cookies. Default is True.
remaining_ok (bool, optional) – Allow processing if folder has ≥50 files (API limit). Default is False.
name_part (str, optional) – Filter items by name substring. Default is empty string (no filter).

Returns:

return_code (bool) – True if successful, False if failed (network error, permissions, etc.).
gdrive_file (GoogleDriveFile or None) – Folder structure with nested children, or None if failed.

Raises:

RuntimeError – If folder has ≥50 files and remaining_ok is False.

Notes

Recursively processes subfolders. Limited to 50 items per folder due to Google Drive API restrictions.

Usage Examples

File ID Extraction

from driada.gdrive import id_from_link

# Extract ID from various URL formats
urls = [
    'https://drive.google.com/file/d/1abc123.../view',
    'https://drive.google.com/open?id=1abc123...',
    'https://drive.google.com/drive/folders/1xyz789...',
    'https://docs.google.com/document/d/1doc456.../edit'
]

for url in urls:
    file_id = id_from_link(url)
    print(f"ID: {file_id}")

Download and Parse Links

from driada.gdrive import download_and_parse_google_drive_link

# Download and parse a Google Drive folder page
folder_url = 'https://drive.google.com/drive/folders/1xyz789...'

# Get folder info and its contents
success, folder_file = download_and_parse_google_drive_link(
    folder_url,
    quiet=True,  # Suppress output
    name_part="exp"  # Filter by name
)

if success and folder_file:
    print(f"Folder: {folder_file['name']}")
    print(f"Contains {len(folder_file.get('children', []))} matching files")

# Process children
for child in folder_file.get('children', []):
    print(f"- {child['name']} ({child['id']})")
    print(f"  Type: {child['type']}")

GoogleDriveFile Class

from driada.gdrive import GoogleDriveFile

# GoogleDriveFile is a simple data class
gdfile = GoogleDriveFile(
    id='1abc123...',
    name='experiment_data.mat',
    type='file'
)

# Access attributes
print(f"File: {gdfile.name}")
print(f"ID: {gdfile.id}")
print(f"Type: {gdfile.type}")

# Use with download functions
from driada.gdrive import download_gdrive_data
download_gdrive_data(auth, gdfile, 'local_path/')

Parse Google Drive HTML

from driada.gdrive import parse_google_drive_file
import requests

# This is a low-level function used internally
# parse_google_drive_file requires HTML content
folder_url = 'https://drive.google.com/drive/folders/1xyz789...'

# Get the raw HTML content first
response = requests.get(folder_url)

# Parse the content
folder_file, children = parse_google_drive_file(
    folder_url,
    response.text,
    use_cookies=True
)

Integration with Other Functions

These utilities work seamlessly with download functions:

from driada.gdrive import (
    id_from_link,
    download_gdrive_data,
    desktop_auth
)

# Authenticate
auth = desktop_auth('path/to/client_secrets.json')

# Extract ID from URL
url = 'https://drive.google.com/file/d/1abc123.../view?usp=sharing'
file_id = id_from_link(url)

# Download using extracted ID
download_gdrive_data(auth, file_id, 'local_data.mat')