Utility Functions

This module provides utility functions for Google Drive operations.

Classes

class driada.gdrive.gdrive_utils.GoogleDriveFile(id, name, type, children=None)[source]

Represent Google Drive file objects structure.

id

Unique id, used to build the download URL.

Type:

str

name

Actual name, used as file name.

Type:

str

type

MIME type, or application/vnd.google-apps.folder if it is a folder

Type:

str

children

If it is a directory, it contains the folder files/directories

Type:

List[GoogleDriveFile]

__init__(id, name, type, children=None)[source]

Initialize GoogleDriveFile instance.

Parameters:
  • id (str) – Unique file or folder ID from Google Drive.

  • name (str) – Display name of the file or folder.

  • type (str) – MIME type of the file, or ‘application/vnd.google-apps.folder’ for folders.

  • children (List[GoogleDriveFile], optional) – Child items if this is a folder. Default is empty list.

is_folder()[source]

Check if the GoogleDriveFile is a folder.

Returns:

True if the file is a folder, False otherwise.

Return type:

bool

Notes

Uses the global folder_type constant for comparison.

__repr__()[source]

Return string representation of GoogleDriveFile.

Returns:

Formatted string showing all attributes including children.

Return type:

str

Notes

May produce long output if there are many children.

Functions

driada.gdrive.parse_google_drive_file(folder, content, use_cookies=True)[source]

Extract information about the current page file and its children.

Parameters:
  • folder (str) – URL of the Google Drive folder. Must be of the format ‘https://drive.google.com/drive/folders/{id}’.

  • content (str) – Google Drive’s raw HTML content.

  • use_cookies (bool, optional) – Whether to clear cookies. Default is True.

Returns:

  • gdrive_file (GoogleDriveFile) – Current GoogleDriveFile object with empty children list.

  • id_name_type_iter (list) – List of tuples (id, name, type) for each child item.

Raises:

RuntimeError – If folder information cannot be extracted from HTML.

Notes

Parses JavaScript data embedded in Google Drive HTML. Expects specific HTML structure and may break with Google Drive updates.

Extract the file or folder ID from a Google Drive URL.

Parameters:

link (str) – Google Drive URL containing the file or folder ID. Can be in format: - https://drive.google.com/drive/folders/{id} - https://drive.google.com/file/d/{id}/view - https://drive.google.com/open?id={id}

Returns:

The extracted file or folder ID.

Return type:

str

Raises:

ValueError – If the link doesn’t contain ‘http’.

Examples

>>> id_from_link('https://drive.google.com/drive/folders/1a2b3c4d5e')
'1a2b3c4d5e'
>>> id_from_link('https://drive.google.com/open?id=xyz123')
'xyz123'

Notes

Does not validate the extracted ID format. May return empty string or invalid IDs for malformed URLs.

Get folder structure of Google Drive folder URL.

Parameters:
  • folder (str) – URL of the Google Drive folder. Must be of the format ‘https://drive.google.com/drive/folders/{id}’.

  • quiet (bool, optional) – Suppress terminal output. Default is False.

  • use_cookies (bool, optional) – Flag to use cookies. Default is True.

  • remaining_ok (bool, optional) – Allow processing if folder has ≥50 files (API limit). Default is False.

  • name_part (str, optional) – Filter items by name substring. Default is empty string (no filter).

Returns:

  • return_code (bool) – True if successful, False if failed (network error, permissions, etc.).

  • gdrive_file (GoogleDriveFile or None) – Folder structure with nested children, or None if failed.

Raises:

RuntimeError – If folder has ≥50 files and remaining_ok is False.

Notes

Recursively processes subfolders. Limited to 50 items per folder due to Google Drive API restrictions.

Usage Examples

File ID Extraction

from driada.gdrive import id_from_link

# Extract ID from various URL formats
urls = [
    'https://drive.google.com/file/d/1abc123.../view',
    'https://drive.google.com/open?id=1abc123...',
    'https://drive.google.com/drive/folders/1xyz789...',
    'https://docs.google.com/document/d/1doc456.../edit'
]

for url in urls:
    file_id = id_from_link(url)
    print(f"ID: {file_id}")

GoogleDriveFile Class

from driada.gdrive import GoogleDriveFile

# GoogleDriveFile is a simple data class
gdfile = GoogleDriveFile(
    id='1abc123...',
    name='experiment_data.mat',
    type='file'
)

# Access attributes
print(f"File: {gdfile.name}")
print(f"ID: {gdfile.id}")
print(f"Type: {gdfile.type}")

# Use with download functions
from driada.gdrive import download_gdrive_data
download_gdrive_data(auth, gdfile, 'local_path/')

Parse Google Drive HTML

from driada.gdrive import parse_google_drive_file
import requests

# This is a low-level function used internally
# parse_google_drive_file requires HTML content
folder_url = 'https://drive.google.com/drive/folders/1xyz789...'

# Get the raw HTML content first
response = requests.get(folder_url)

# Parse the content
folder_file, children = parse_google_drive_file(
    folder_url,
    response.text,
    use_cookies=True
)

Integration with Other Functions

These utilities work seamlessly with download functions:

from driada.gdrive import (
    id_from_link,
    download_gdrive_data,
    desktop_auth
)

# Authenticate
auth = desktop_auth('path/to/client_secrets.json')

# Extract ID from URL
url = 'https://drive.google.com/file/d/1abc123.../view?usp=sharing'
file_id = id_from_link(url)

# Download using extracted ID
download_gdrive_data(auth, file_id, 'local_data.mat')