diff --git a/experimental/docs/gdrive_backend_setup.md b/experimental/docs/gdrive_backend_setup.md new file mode 100644 index 000000000..51c7b8502 --- /dev/null +++ b/experimental/docs/gdrive_backend_setup.md @@ -0,0 +1,197 @@ +# Google Drive Backend Setup Guide + +This guide will help you set up and use the Google Drive backend for Ragas datasets. + +## Prerequisites + +### 1. Install Dependencies + +```bash +pip install google-api-python-client google-auth google-auth-oauthlib +``` + +### 2. Set up Google Cloud Project + +1. Go to the [Google Cloud Console](https://console.cloud.google.com/) +2. Create a new project or select an existing one +3. Enable the following APIs: + - Google Drive API + - Google Sheets API + +### 3. Create Credentials + +You have two options for authentication: + +#### Option A: OAuth 2.0 (Recommended for development) + +1. In Google Cloud Console, go to "Credentials" +2. Click "Create Credentials" → "OAuth client ID" +3. Choose "Desktop application" +4. Download the JSON file +5. Save it securely (e.g., as `credentials.json`) + +#### Option B: Service Account (Recommended for production) + +1. In Google Cloud Console, go to "Credentials" +2. Click "Create Credentials" → "Service account" +3. Fill in the details and create the account +4. Generate a key (JSON format) +5. Download and save the JSON file securely +6. Share your Google Drive folder with the service account email + +## Setup Instructions + +### 1. Create a Google Drive Folder + +1. Create a folder in Google Drive where you want to store your datasets +2. Get the folder ID from the URL: + ``` + https://drive.google.com/drive/folders/FOLDER_ID_HERE + ``` +3. If using a service account, share this folder with the service account email + +### 2. Set Environment Variables (Optional) + +```bash +export GDRIVE_FOLDER_ID="your_folder_id_here" +export GDRIVE_CREDENTIALS_PATH="path/to/credentials.json" +# OR for service account: +export GDRIVE_SERVICE_ACCOUNT_PATH="path/to/service_account.json" +``` + +### 3. Basic Usage + +```python +from ragas_experimental.project.core import Project +from pydantic import BaseModel + +# Define your data model +class EvaluationEntry(BaseModel): + question: str + answer: str + score: float + +# Create project with Google Drive backend +project = Project.create( + name="my_project", + backend="gdrive", + gdrive_folder_id="your_folder_id_here", + gdrive_credentials_path="path/to/credentials.json" # OAuth + # OR + # gdrive_service_account_path="path/to/service_account.json" # Service Account +) + +# Create a dataset +dataset = project.create_dataset( + model=EvaluationEntry, + name="my_dataset" +) + +# Add data +entry = EvaluationEntry( + question="What is AI?", + answer="Artificial Intelligence", + score=0.95 +) +dataset.append(entry) + +# Load and access data +dataset.load() +print(f"Dataset has {len(dataset)} entries") +for entry in dataset: + print(f"{entry.question} -> {entry.answer}") +``` + +## File Structure + +When you use the Google Drive backend, it creates the following structure: + +``` +Your Google Drive Folder/ +├── project_name/ +│ ├── datasets/ +│ │ ├── dataset1.gsheet +│ │ └── dataset2.gsheet +│ └── experiments/ +│ └── experiment1.gsheet +``` + +Each dataset is stored as a Google Sheet with: +- Column headers matching your model fields +- An additional `_row_id` column for internal tracking +- Automatic type conversion when loading data + +## Authentication Flow + +### OAuth (First Time) +1. When you first run your code, a browser window will open +2. Sign in to your Google account +3. Grant permissions to access Google Drive +4. A `token.json` file will be created automatically +5. Subsequent runs will use this token (no browser needed) + +### Service Account +1. No interactive authentication required +2. Make sure the service account has access to your folder +3. The JSON key file is used directly + +## Troubleshooting + +### Common Issues + +1. **"Folder not found" error** + - Check that the folder ID is correct + - Ensure the folder is shared with your service account (if using one) + +2. **Authentication errors** + - Verify your credentials file path + - Check that the required APIs are enabled + - For OAuth: Delete `token.json` and re-authenticate + +3. **Permission errors** + - Make sure your account has edit access to the folder + - For service accounts: share the folder with the service account email + +4. **Import errors** + - Install required dependencies: `pip install google-api-python-client google-auth google-auth-oauthlib` + +### Getting Help + +If you encounter issues: +1. Check the error message carefully +2. Verify your Google Cloud setup +3. Test authentication with a simple Google Drive API call +4. Check that all dependencies are installed + +## Security Best Practices + +1. **Never commit credentials to version control** +2. **Use environment variables for sensitive information** +3. **Limit service account permissions to minimum required** +4. **Regularly rotate service account keys** +5. **Use OAuth for development, service accounts for production** + +## Advanced Configuration + +### Custom Authentication Paths + +```python +project = Project.create( + name="my_project", + backend="gdrive", + gdrive_folder_id="folder_id", + gdrive_credentials_path="/custom/path/to/credentials.json", + gdrive_token_path="/custom/path/to/token.json" +) +``` + +### Multiple Projects + +You can have multiple projects in the same Google Drive folder: + +```python +project1 = Project.create(name="project1", backend="gdrive", ...) +project2 = Project.create(name="project2", backend="gdrive", ...) +``` + +Each will create its own subfolder structure. diff --git a/experimental/examples/gdrive_backend_example.py b/experimental/examples/gdrive_backend_example.py new file mode 100644 index 000000000..724436347 --- /dev/null +++ b/experimental/examples/gdrive_backend_example.py @@ -0,0 +1,121 @@ +""" +Example usage of the Google Drive backend for Ragas. + +This example shows how to: +1. Set up authentication for Google Drive +2. Create a project with Google Drive backend +3. Create and manage datasets stored in Google Sheets + +Prerequisites: +1. Install required dependencies: + pip install google-api-python-client google-auth google-auth-oauthlib + +2. Set up Google Drive API credentials: + - Go to Google Cloud Console + - Enable Google Drive API and Google Sheets API + - Create credentials (either OAuth or Service Account) + - Download the JSON file + +3. Set environment variables or provide paths directly +""" + +import os +from pydantic import BaseModel +from ragas_experimental.project.core import Project +from ragas_experimental.metric import MetricResult + + +# Example model for our dataset +class EvaluationEntry(BaseModel): + question: str + answer: str + context: str + score: float + feedback: str + + +def example_oauth_setup(): + """Example using OAuth authentication.""" + + # Set up environment variables (or pass directly to Project.create) + # os.environ["GDRIVE_FOLDER_ID"] = "your_google_drive_folder_id_here" + # os.environ["GDRIVE_CREDENTIALS_PATH"] = "path/to/your/credentials.json" + + # Create project with Google Drive backend + project = Project.create( + name="my_ragas_project", + description="A project using Google Drive for storage", + backend="gdrive", + gdrive_folder_id="1HLvvtKLnwGWKTely0YDlJ397XPTQ77Yg", + gdrive_credentials_path="/Users/derekanderson/Downloads/credentials.json", + gdrive_token_path="token.json" # Will be created automatically + ) + + return project + + +def example_usage(): + """Example of using the Google Drive backend.""" + + # Create a project (choose one of the authentication methods above) + project = example_oauth_setup() # or example_service_account_setup() + + # Create a dataset + dataset = project.create_dataset( + model=EvaluationEntry, + name="evaluation_results" + ) + + # Add some entries + entry1 = EvaluationEntry( + question="What is the capital of France?", + answer="Paris", + context="France is a country in Europe.", + score=0.95, + feedback="Correct answer" + ) + + entry2 = EvaluationEntry( + question="What is 2+2?", + answer="4", + context="Basic arithmetic question.", + score=1.0, + feedback="Perfect answer" + ) + + # Append entries to the dataset + dataset.append(entry1) + dataset.append(entry2) + + # Load all entries + dataset.load() + print(f"Dataset contains {len(dataset)} entries") + + # Access entries + for i, entry in enumerate(dataset): + print(f"Entry {i}: {entry.question} -> {entry.answer} (Score: {entry.score})") + + # Update an entry + dataset[0].score = 0.98 + dataset[0].feedback = "Updated feedback" + dataset[0] = dataset[0] # Trigger update + + # Search for entries + entry = dataset._backend.get_entry_by_field("question", "What is 2+2?", EvaluationEntry) + if entry: + print(f"Found entry: {entry.answer}") + + return dataset + + +if __name__ == "__main__": + # Run the example + try: + dataset = example_usage() + print("Google Drive backend example completed successfully!") + except Exception as e: + print(f"Error: {e}") + print("\nMake sure to:") + print("1. Install required dependencies") + print("2. Set up Google Drive API credentials") + print("3. Update the folder ID and credential paths in this example") diff --git a/experimental/ragas_experimental/backends/__init__.py b/experimental/ragas_experimental/backends/__init__.py index e69de29bb..2ea90e1f2 100644 --- a/experimental/ragas_experimental/backends/__init__.py +++ b/experimental/ragas_experimental/backends/__init__.py @@ -0,0 +1,19 @@ +# Optional imports for backends that require additional dependencies + +# Always available backends +from .ragas_api_client import RagasApiClient +from .factory import RagasApiClientFactory + +# Conditionally import Google Drive backend +try: + from .gdrive_backend import GDriveBackend + __all__ = ["RagasApiClient", "RagasApiClientFactory", "GDriveBackend"] +except ImportError: + __all__ = ["RagasApiClient", "RagasApiClientFactory"] + +# Conditionally import Notion backend if available +try: + from .notion_backend import NotionBackend + __all__.append("NotionBackend") +except ImportError: + pass diff --git a/experimental/ragas_experimental/backends/base.py b/experimental/ragas_experimental/backends/base.py new file mode 100644 index 000000000..d3042d933 --- /dev/null +++ b/experimental/ragas_experimental/backends/base.py @@ -0,0 +1,46 @@ +"""Base classes for dataset backends.""" + +from abc import ABC, abstractmethod +import typing as t + + +class DatasetBackend(ABC): + """Abstract base class for dataset backends. + + All dataset storage backends must implement these methods. + """ + + @abstractmethod + def initialize(self, dataset): + """Initialize the backend with dataset information""" + pass + + @abstractmethod + def get_column_mapping(self, model): + """Get mapping between model fields and backend columns""" + pass + + @abstractmethod + def load_entries(self, model_class): + """Load all entries from storage""" + pass + + @abstractmethod + def append_entry(self, entry): + """Add a new entry to storage and return its ID""" + pass + + @abstractmethod + def update_entry(self, entry): + """Update an existing entry in storage""" + pass + + @abstractmethod + def delete_entry(self, entry_id): + """Delete an entry from storage""" + pass + + @abstractmethod + def get_entry_by_field(self, field_name: str, field_value: t.Any, model_class): + """Get an entry by field value""" + pass diff --git a/experimental/ragas_experimental/backends/gdrive_backend.py b/experimental/ragas_experimental/backends/gdrive_backend.py new file mode 100644 index 000000000..e72a76db7 --- /dev/null +++ b/experimental/ragas_experimental/backends/gdrive_backend.py @@ -0,0 +1,410 @@ +"""Google Drive backend for storing datasets in Google Sheets.""" + +import typing as t +import os +import json +import uuid +from datetime import datetime + +try: + from googleapiclient.discovery import build + from google.oauth2.service_account import Credentials + from google.oauth2.credentials import Credentials as UserCredentials + from google.auth.transport.requests import Request + from google_auth_oauthlib.flow import InstalledAppFlow +except ImportError: + raise ImportError( + "Google Drive backend requires google-api-python-client and google-auth. " + "Install with: pip install google-api-python-client google-auth google-auth-oauthlib" + ) + +from .base import DatasetBackend +from ..utils import create_nano_id + + +class GDriveBackend(DatasetBackend): + """Backend for storing datasets in Google Drive using Google Sheets.""" + + # Scopes needed for Google Drive and Sheets API + SCOPES = [ + 'https://www.googleapis.com/auth/drive', + 'https://www.googleapis.com/auth/spreadsheets' + ] + + def __init__( + self, + folder_id: str, + project_id: str, + dataset_id: str, + dataset_name: str, + type: t.Literal["datasets", "experiments"], + credentials_path: t.Optional[str] = None, + service_account_path: t.Optional[str] = None, + token_path: t.Optional[str] = None, + ): + """Initialize the Google Drive backend. + + Args: + folder_id: The ID of the Google Drive folder to store datasets + project_id: The ID of the project + dataset_id: The ID of the dataset + dataset_name: The name of the dataset + type: Type of data (datasets or experiments) + credentials_path: Path to OAuth credentials JSON file + service_account_path: Path to service account JSON file + token_path: Path to store OAuth token + """ + self.folder_id = folder_id + self.project_id = project_id + self.dataset_id = dataset_id + self.dataset_name = dataset_name + self.type = type + self.dataset = None + + # Authentication paths + self.credentials_path = credentials_path or os.getenv("GDRIVE_CREDENTIALS_PATH") + self.service_account_path = service_account_path or os.getenv("GDRIVE_SERVICE_ACCOUNT_PATH") + self.token_path = token_path or os.getenv("GDRIVE_TOKEN_PATH", "token.json") + + # Initialize Google API clients + self._setup_auth() + + def __str__(self): + return f"GDriveBackend(folder_id={self.folder_id}, project_id={self.project_id}, dataset_id={self.dataset_id}, dataset_name={self.dataset_name})" + + def __repr__(self): + return self.__str__() + + def _setup_auth(self): + """Set up authentication for Google APIs.""" + creds = None + + # Try service account authentication first + if self.service_account_path and os.path.exists(self.service_account_path): + creds = Credentials.from_service_account_file( + self.service_account_path, scopes=self.SCOPES + ) + # Try OAuth authentication + elif self.credentials_path and os.path.exists(self.credentials_path): + # Load existing token if available + if os.path.exists(self.token_path): + creds = UserCredentials.from_authorized_user_file(self.token_path, self.SCOPES) + + # If there are no (valid) credentials available, let the user log in + if not creds or not creds.valid: + if creds and creds.expired and creds.refresh_token: + creds.refresh(Request()) + else: + flow = InstalledAppFlow.from_client_secrets_file( + self.credentials_path, self.SCOPES + ) + creds = flow.run_local_server(port=0) + + # Save the credentials for the next run + with open(self.token_path, 'w') as token: + token.write(creds.to_json()) + else: + raise ValueError( + "No valid authentication method found. Please provide either:\n" + "1. Service account JSON file path via service_account_path or GDRIVE_SERVICE_ACCOUNT_PATH\n" + "2. OAuth credentials JSON file path via credentials_path or GDRIVE_CREDENTIALS_PATH" + ) + + # Build the services + self.drive_service = build('drive', 'v3', credentials=creds) + self.sheets_service = build('sheets', 'v4', credentials=creds) + + def initialize(self, dataset): + """Initialize the backend with the dataset instance.""" + self.dataset = dataset + + # Ensure the project folder structure exists + self._ensure_folder_structure() + + # Ensure the spreadsheet exists + self._ensure_spreadsheet_exists() + + def _ensure_folder_structure(self): + """Create the folder structure in Google Drive if it doesn't exist.""" + try: + # Check if main folder exists + folder_metadata = self.drive_service.files().get(fileId=self.folder_id).execute() + except: + raise ValueError(f"Folder with ID {self.folder_id} not found or not accessible") + + # Create project folder if it doesn't exist + project_folder_id = self._get_or_create_folder(self.project_id, self.folder_id) + + # Create type folder (datasets/experiments) if it doesn't exist + self.type_folder_id = self._get_or_create_folder(self.type, project_folder_id) + + def _get_or_create_folder(self, folder_name: str, parent_id: str) -> str: + """Get existing folder ID or create new folder.""" + # Search for existing folder + query = f"name='{folder_name}' and '{parent_id}' in parents and mimeType='application/vnd.google-apps.folder' and trashed=false" + results = self.drive_service.files().list(q=query).execute() + folders = results.get('files', []) + + if folders: + return folders[0]['id'] + + # Create new folder + folder_metadata = { + 'name': folder_name, + 'parents': [parent_id], + 'mimeType': 'application/vnd.google-apps.folder' + } + folder = self.drive_service.files().create(body=folder_metadata).execute() + return folder['id'] + + def _ensure_spreadsheet_exists(self): + """Create the Google Sheet if it doesn't exist.""" + spreadsheet_name = f"{self.dataset_name}.gsheet" + + # Search for existing spreadsheet + query = f"name='{spreadsheet_name}' and '{self.type_folder_id}' in parents and mimeType='application/vnd.google-apps.spreadsheet' and trashed=false" + results = self.drive_service.files().list(q=query).execute() + sheets = results.get('files', []) + + if sheets: + self.spreadsheet_id = sheets[0]['id'] + # Always ensure headers are correct for existing sheets + self._initialize_spreadsheet_headers() + else: + # Create new spreadsheet + spreadsheet_metadata = { + 'name': spreadsheet_name, + 'parents': [self.type_folder_id], + 'mimeType': 'application/vnd.google-apps.spreadsheet' + } + spreadsheet = self.drive_service.files().create(body=spreadsheet_metadata).execute() + self.spreadsheet_id = spreadsheet['id'] + + # Initialize with headers + self._initialize_spreadsheet_headers() + + def _initialize_spreadsheet_headers(self): + """Initialize the spreadsheet with headers.""" + if self.dataset is not None: + try: + # Include _row_id in the headers + expected_headers = ["_row_id"] + list(self.dataset.model.model_fields.keys()) + + # Check if headers are already correct + try: + result = self.sheets_service.spreadsheets().values().get( + spreadsheetId=self.spreadsheet_id, + range="1:1" # First row only + ).execute() + + existing_headers = result.get('values', [[]])[0] if result.get('values') else [] + + # If headers are already correct, don't change anything + if existing_headers == expected_headers: + return + + except Exception as e: + # If we can't read headers, proceed with setting them + pass + + # Clear and set headers (this will remove all existing data!) + self.sheets_service.spreadsheets().values().clear( + spreadsheetId=self.spreadsheet_id, + range="A:Z" + ).execute() + + self.sheets_service.spreadsheets().values().update( + spreadsheetId=self.spreadsheet_id, + range="A1", + valueInputOption="RAW", + body={"values": [expected_headers]} + ).execute() + + except Exception as e: + print(f"Warning: Could not initialize spreadsheet headers: {e}") + + def get_column_mapping(self, model) -> t.Dict: + """Get mapping between model fields and spreadsheet columns.""" + # For Google Sheets, column names directly match field names (like CSV) + return model.model_fields + + def load_entries(self, model_class): + """Load all entries from the Google Sheet.""" + try: + # Get all data from the sheet + result = self.sheets_service.spreadsheets().values().get( + spreadsheetId=self.spreadsheet_id, + range="A:Z" + ).execute() + + values = result.get('values', []) + if not values: + return [] + + # First row contains headers + headers = values[0] + entries = [] + + for row_data in values[1:]: # Skip header row + try: + # Skip empty rows + if not row_data or all(not cell.strip() for cell in row_data if cell): + continue + + # Pad row data to match headers length + while len(row_data) < len(headers): + row_data.append("") + + # Create dictionary from headers and row data + row_dict = dict(zip(headers, row_data)) + + # Extract row_id and remove from model data + row_id = row_dict.get("_row_id", str(uuid.uuid4())) + + # Create model data without _row_id + model_data = {k: v for k, v in row_dict.items() if k != "_row_id"} + + # Skip if all values are empty + if not any(str(v).strip() for v in model_data.values()): + continue + + # Convert types as needed + typed_row = {} + for field, value in model_data.items(): + if field in model_class.model_fields: + field_type = model_class.model_fields[field].annotation + + # Handle basic type conversions + if field_type == int: + typed_row[field] = int(value) if value else 0 + elif field_type == float: + typed_row[field] = float(value) if value else 0.0 + elif field_type == bool: + typed_row[field] = value.lower() in ("true", "t", "yes", "y", "1") + else: + typed_row[field] = value + + # Create model instance + entry = model_class(**typed_row) + entry._row_id = row_id + entries.append(entry) + + except Exception as e: + print(f"Error loading row from Google Sheets: {e}") + + return entries + + except Exception as e: + print(f"Error loading entries from Google Sheets: {e}") + return [] + + def append_entry(self, entry): + """Add a new entry to the Google Sheet and return a generated ID.""" + # Generate row ID if needed + row_id = getattr(entry, "_row_id", None) or str(uuid.uuid4()) + + # Convert entry to row data + entry_dict = entry.model_dump() + entry_dict["_row_id"] = row_id + + # Get headers to maintain order + headers = ["_row_id"] + list(entry.model_fields.keys()) + row_data = [entry_dict.get(header, "") for header in headers] + + # Append to sheet + self.sheets_service.spreadsheets().values().append( + spreadsheetId=self.spreadsheet_id, + range="A:A", + valueInputOption="RAW", + insertDataOption="INSERT_ROWS", + body={"values": [row_data]} + ).execute() + + return row_id + + def update_entry(self, entry): + """Update an existing entry in the Google Sheet.""" + if not hasattr(entry, "_row_id") or not entry._row_id: + raise ValueError("Cannot update: entry has no row ID") + + # Find the row with this ID + result = self.sheets_service.spreadsheets().values().get( + spreadsheetId=self.spreadsheet_id, + range="A:A" + ).execute() + + row_ids = [row[0] if row else "" for row in result.get('values', [])] + + try: + row_index = row_ids.index(entry._row_id) + row_number = row_index + 1 # Sheets are 1-indexed + + # Convert entry to row data + entry_dict = entry.model_dump() + entry_dict["_row_id"] = entry._row_id + + # Get headers to maintain order + headers = ["_row_id"] + list(entry.model_fields.keys()) + row_data = [entry_dict.get(header, "") for header in headers] + + # Update the specific row + self.sheets_service.spreadsheets().values().update( + spreadsheetId=self.spreadsheet_id, + range=f"A{row_number}:Z{row_number}", + valueInputOption="RAW", + body={"values": [row_data]} + ).execute() + + return True + + except ValueError: + # Row ID not found, append as new entry + return self.append_entry(entry) + + def delete_entry(self, entry_id): + """Delete an entry from the Google Sheet.""" + # Find the row with this ID + result = self.sheets_service.spreadsheets().values().get( + spreadsheetId=self.spreadsheet_id, + range="A:A" + ).execute() + + row_ids = [row[0] if row else "" for row in result.get('values', [])] + + try: + row_index = row_ids.index(entry_id) + row_number = row_index + 1 # Sheets are 1-indexed + + # Delete the row + request = { + 'deleteDimension': { + 'range': { + 'sheetId': 0, # Assume first sheet + 'dimension': 'ROWS', + 'startIndex': row_index, + 'endIndex': row_index + 1 + } + } + } + + self.sheets_service.spreadsheets().batchUpdate( + spreadsheetId=self.spreadsheet_id, + body={'requests': [request]} + ).execute() + + return True + + except ValueError: + # Row ID not found + return False + + def get_entry_by_field(self, field_name: str, field_value: t.Any, model_class): + """Get an entry by field value.""" + # Load all entries and search + entries = self.load_entries(model_class) + + for entry in entries: + if hasattr(entry, field_name) and getattr(entry, field_name) == field_value: + return entry + + return None diff --git a/experimental/ragas_experimental/dataset.py b/experimental/ragas_experimental/dataset.py index 527ded343..c865d5302 100644 --- a/experimental/ragas_experimental/dataset.py +++ b/experimental/ragas_experimental/dataset.py @@ -3,10 +3,9 @@ # AUTOGENERATED! DO NOT EDIT! File to edit: ../nbs/api/dataset.ipynb. # %% auto 0 -__all__ = ['BaseModelType', 'DatasetBackend', 'RagasAppBackend', 'LocalBackend', 'create_dataset_backend', 'Dataset'] +__all__ = ['BaseModelType', 'DatasetBackend', 'RagasAppBackend', 'LocalBackend', 'GDriveBackend', 'create_dataset_backend', 'Dataset'] # %% ../nbs/api/dataset.ipynb 2 -from abc import ABC, abstractmethod import os import typing as t import csv @@ -20,55 +19,22 @@ ) from .utils import create_nano_id, async_to_sync, get_test_directory from .backends.ragas_api_client import RagasApiClient +from .backends.base import DatasetBackend from .typing import SUPPORTED_BACKENDS import ragas_experimental.typing as rt from .metric import MetricResult +# Import GDriveBackend with optional import +try: + from .backends.gdrive_backend import GDriveBackend + GDRIVE_AVAILABLE = True +except ImportError: + GDRIVE_AVAILABLE = False + GDriveBackend = None + # %% ../nbs/api/dataset.ipynb 3 BaseModelType = t.TypeVar("BaseModelType", bound=BaseModel) - -class DatasetBackend(ABC): - """Abstract base class for dataset backends. - - All dataset storage backends must implement these methods. - """ - - @abstractmethod - def initialize(self, dataset): - """Initialize the backend with dataset information""" - pass - - @abstractmethod - def get_column_mapping(self, model): - """Get mapping between model fields and backend columns""" - pass - - @abstractmethod - def load_entries(self, model_class): - """Load all entries from storage""" - pass - - @abstractmethod - def append_entry(self, entry): - """Add a new entry to storage and return its ID""" - pass - - @abstractmethod - def update_entry(self, entry): - """Update an existing entry in storage""" - pass - - @abstractmethod - def delete_entry(self, entry_id): - """Delete an entry from storage""" - pass - - @abstractmethod - def get_entry_by_field(self, field_name: str, field_value: t.Any, model_class): - """Get an entry by field value""" - pass - # %% ../nbs/api/dataset.ipynb 4 class RagasAppBackend(DatasetBackend): """Backend for storing datasets using the Ragas API.""" @@ -472,7 +438,7 @@ def create_dataset_backend(backend_type: SUPPORTED_BACKENDS, **kwargs): """Factory function to create the appropriate backend. Args: - backend_type: The type of backend to create (ragas_app or local) + backend_type: The type of backend to create (ragas_app, local, or gdrive) **kwargs: Arguments specific to the backend Returns: @@ -482,9 +448,19 @@ def create_dataset_backend(backend_type: SUPPORTED_BACKENDS, **kwargs): "ragas_app": RagasAppBackend, "local": LocalBackend, } + + # Add GDriveBackend if available + if GDRIVE_AVAILABLE and GDriveBackend is not None: + backend_classes["gdrive"] = GDriveBackend + elif backend_type == "gdrive": + raise ImportError( + "Google Drive backend requires additional dependencies. " + "Install with: pip install google-api-python-client google-auth google-auth-oauthlib" + ) if backend_type not in backend_classes: - raise ValueError(f"Unsupported backend: {backend_type}") + available_backends = list(backend_classes.keys()) + raise ValueError(f"Unsupported backend: {backend_type}. Available backends: {available_backends}") return backend_classes[backend_type](**kwargs) @@ -506,6 +482,10 @@ def __init__( ragas_api_client: t.Optional[RagasApiClient] = None, backend: SUPPORTED_BACKENDS = "local", local_root_dir: t.Optional[str] = None, + gdrive_folder_id: t.Optional[str] = None, + gdrive_credentials_path: t.Optional[str] = None, + gdrive_service_account_path: t.Optional[str] = None, + gdrive_token_path: t.Optional[str] = None, ): """Initialize a Dataset with the specified backend. @@ -514,9 +494,14 @@ def __init__( model: The Pydantic model class for entries project_id: The ID of the parent project dataset_id: The ID of this dataset + datatable_type: Type of datatable (datasets or experiments) ragas_api_client: Required for ragas_app backend - backend: The storage backend to use (ragas_app or local) + backend: The storage backend to use (ragas_app, local, or gdrive) local_root_dir: Required for local backend + gdrive_folder_id: Required for gdrive backend - Google Drive folder ID + gdrive_credentials_path: Optional for gdrive backend - OAuth credentials path + gdrive_service_account_path: Optional for gdrive backend - Service account path + gdrive_token_path: Optional for gdrive backend - Token storage path """ # Store basic properties self.name = name @@ -547,6 +532,19 @@ def __init__( "dataset_name": name, "type": self.datatable_type, } + elif backend == "gdrive": + if gdrive_folder_id is None: + raise ValueError("gdrive_folder_id is required for gdrive backend") + backend_params = { + "folder_id": gdrive_folder_id, + "project_id": project_id, + "dataset_id": dataset_id, + "dataset_name": name, + "type": self.datatable_type, + "credentials_path": gdrive_credentials_path, + "service_account_path": gdrive_service_account_path, + "token_path": gdrive_token_path, + } self._backend = create_dataset_backend(backend, **backend_params) diff --git a/experimental/ragas_experimental/project/core.py b/experimental/ragas_experimental/project/core.py index b709e1171..8a3a0dd39 100644 --- a/experimental/ragas_experimental/project/core.py +++ b/experimental/ragas_experimental/project/core.py @@ -43,6 +43,10 @@ def __init__( self._ragas_api_client = RagasApiClientFactory.create() else: self._ragas_api_client = ragas_api_client + elif backend == "gdrive": + # Google Drive backend initialization is handled in create() method + # since it requires additional parameters + pass else: raise ValueError(f"Invalid backend: {backend}") @@ -59,6 +63,12 @@ def __init__( elif backend == "local": self.name = self.project_id self.description = "" + elif backend == "gdrive": + # For gdrive, name and description are set in create() method + if not hasattr(self, 'name'): + self.name = self.project_id + if not hasattr(self, 'description'): + self.description = "" def _create_local_project_structure(self): """Create the local directory structure for the project""" @@ -77,6 +87,12 @@ def create( backend: rt.SUPPORTED_BACKENDS = "local", root_dir: t.Optional[str] = None, ragas_api_client: t.Optional[RagasApiClient] = None, + # Google Drive backend parameters + gdrive_folder_id: t.Optional[str] = None, + gdrive_service_account_path: t.Optional[str] = None, + gdrive_credentials_path: t.Optional[str] = None, + gdrive_token_path: t.Optional[str] = None, + **kwargs ): if backend == "ragas_app": ragas_api_client = ragas_api_client or RagasApiClientFactory.create() @@ -91,6 +107,26 @@ def create( # For local backend, we use the name as the project_id project_id = name return cls(project_id, backend="local", root_dir=root_dir) + elif backend == "gdrive": + if gdrive_folder_id is None: + raise ValueError("gdrive_folder_id is required for Google Drive backend") + + # Create project instance with Google Drive backend + project = cls.__new__(cls) + project.project_id = name + project.name = name + project.description = description + project.backend = backend + + # Store Google Drive configuration + project._gdrive_folder_id = gdrive_folder_id + project._gdrive_service_account_path = gdrive_service_account_path + project._gdrive_credentials_path = gdrive_credentials_path + project._gdrive_token_path = gdrive_token_path + + return project + else: + raise ValueError(f"Unsupported backend: {backend}") # %% ../../nbs/api/project/core.ipynb 9 @patch diff --git a/experimental/ragas_experimental/project/datasets.py b/experimental/ragas_experimental/project/datasets.py index 5f77c0cd2..0bd2882c2 100644 --- a/experimental/ragas_experimental/project/datasets.py +++ b/experimental/ragas_experimental/project/datasets.py @@ -101,6 +101,37 @@ def get_dataset_from_local( local_root_dir=os.path.dirname(self._root_dir), # Root dir for all projects ) +# %% ../../nbs/api/project/datasets.ipynb 6a +def get_dataset_from_gdrive( + self: Project, name: str, model: t.Type[BaseModel] +) -> Dataset: + """Create a dataset in the Google Drive backend. + + Args: + name: Name of the dataset + model: Pydantic model defining the structure + + Returns: + Dataset: A new dataset configured to use the Google Drive backend + """ + # Use a UUID as the dataset ID + dataset_id = create_nano_id() + + # Return a new Dataset instance with Google Drive backend + return Dataset( + name=name if name is not None else model.__name__, + model=model, + datatable_type="datasets", + project_id=self.project_id, + dataset_id=dataset_id, + backend="gdrive", + # Pass Google Drive configuration from project + gdrive_folder_id=getattr(self, '_gdrive_folder_id', None), + gdrive_service_account_path=getattr(self, '_gdrive_service_account_path', None), + gdrive_credentials_path=getattr(self, '_gdrive_credentials_path', None), + gdrive_token_path=getattr(self, '_gdrive_token_path', None), + ) + # %% ../../nbs/api/project/datasets.ipynb 7 @patch def create_dataset( @@ -132,6 +163,8 @@ def create_dataset( return get_dataset_from_local(self, name, model) elif backend == "ragas_app": return get_dataset_from_ragas_app(self, name, model) + elif backend == "gdrive": + return get_dataset_from_gdrive(self, name, model) else: raise ValueError(f"Unsupported backend: {backend}") diff --git a/experimental/ragas_experimental/typing.py b/experimental/ragas_experimental/typing.py index 9e1b42deb..18b8ce677 100644 --- a/experimental/ragas_experimental/typing.py +++ b/experimental/ragas_experimental/typing.py @@ -22,7 +22,7 @@ import typing as t # Define supported backends -SUPPORTED_BACKENDS = t.Literal["ragas_app", "local"] +SUPPORTED_BACKENDS = t.Literal["ragas_app", "local", "gdrive"] # %% ../nbs/api/typing.ipynb 6 class ColumnType(str, Enum): diff --git a/experimental/tests/test_gdrive_backend.py b/experimental/tests/test_gdrive_backend.py new file mode 100644 index 000000000..5ee779630 --- /dev/null +++ b/experimental/tests/test_gdrive_backend.py @@ -0,0 +1,543 @@ +""" +Unit tests for Google Drive backend implementation. +These tests comprehensively test the GDriveBackend class with proper mocking. +""" + +import pytest +import uuid +import json +import sys +from unittest.mock import Mock, patch, MagicMock, mock_open +from pydantic import BaseModel + +from ragas_experimental.typing import SUPPORTED_BACKENDS +from ragas_experimental.dataset import create_dataset_backend +from ragas_experimental.project.core import Project + + +class SampleModel(BaseModel): + name: str + value: int + description: str + + +class TestGDriveBackendSupport: + """Test Google Drive backend support in the system.""" + + def test_gdrive_backend_in_supported_backends(self): + """Test that gdrive is included in supported backends.""" + assert "gdrive" in SUPPORTED_BACKENDS.__args__ + + def test_create_dataset_backend_handles_gdrive_missing_deps(self): + """Test that factory handles missing Google Drive dependencies gracefully.""" + # Temporarily modify GDRIVE_AVAILABLE to simulate missing dependencies + from ragas_experimental import dataset + original_gdrive_available = dataset.GDRIVE_AVAILABLE + original_gdrive_backend = dataset.GDriveBackend + + try: + # Simulate missing dependencies + dataset.GDRIVE_AVAILABLE = False + dataset.GDriveBackend = None + + with pytest.raises(ImportError, match="Google Drive backend requires additional dependencies"): + create_dataset_backend( + "gdrive", + folder_id="test_folder", + project_id="test_project", + dataset_id="test_dataset", + dataset_name="test_dataset", + type="datasets" + ) + finally: + # Restore original values + dataset.GDRIVE_AVAILABLE = original_gdrive_available + dataset.GDriveBackend = original_gdrive_backend + + +class TestGDriveBackendInitialization: + """Test GDriveBackend initialization and authentication setup.""" + + @patch('ragas_experimental.backends.gdrive_backend.build') + @patch('ragas_experimental.backends.gdrive_backend.Credentials') + @patch('os.path.exists') + def test_service_account_auth_success(self, mock_exists, mock_credentials, mock_build): + """Test successful service account authentication.""" + mock_exists.return_value = True + mock_creds = Mock() + mock_credentials.from_service_account_file.return_value = mock_creds + mock_drive_service = Mock() + mock_sheets_service = Mock() + mock_build.side_effect = [mock_drive_service, mock_sheets_service] + + try: + from ragas_experimental.backends.gdrive_backend import GDriveBackend + + backend = GDriveBackend( + folder_id="test_folder", + project_id="test_project", + dataset_id="test_dataset", + dataset_name="test_dataset", + type="datasets", + service_account_path="/path/to/service_account.json" + ) + + assert backend.folder_id == "test_folder" + assert backend.project_id == "test_project" + assert backend.dataset_id == "test_dataset" + assert backend.dataset_name == "test_dataset" + assert backend.type == "datasets" + assert backend.drive_service == mock_drive_service + assert backend.sheets_service == mock_sheets_service + + mock_credentials.from_service_account_file.assert_called_once_with( + "/path/to/service_account.json", + scopes=['https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/spreadsheets'] + ) + + except ImportError: + pytest.skip("Google Drive dependencies not available") + + @patch('ragas_experimental.backends.gdrive_backend.build') + @patch('ragas_experimental.backends.gdrive_backend.UserCredentials') + @patch('ragas_experimental.backends.gdrive_backend.InstalledAppFlow') + @patch('os.path.exists') + def test_oauth_auth_new_token(self, mock_exists, mock_flow, mock_user_creds, mock_build): + """Test OAuth authentication with new token creation.""" + # Mock file existence checks + def exists_side_effect(path): + return path == "/path/to/credentials.json" + mock_exists.side_effect = exists_side_effect + + # Mock OAuth flow + mock_flow_instance = Mock() + mock_flow.from_client_secrets_file.return_value = mock_flow_instance + mock_creds = Mock() + mock_creds.valid = True + mock_flow_instance.run_local_server.return_value = mock_creds + mock_creds.to_json.return_value = '{"token": "test"}' + + try: + from ragas_experimental.backends.gdrive_backend import GDriveBackend + + with patch("builtins.open", mock_open()) as mock_file: + backend = GDriveBackend( + folder_id="test_folder", + project_id="test_project", + dataset_id="test_dataset", + dataset_name="test_dataset", + type="datasets", + credentials_path="/path/to/credentials.json", + token_path="/path/to/token.json" + ) + + # Verify OAuth flow was initiated + mock_flow.from_client_secrets_file.assert_called_once() + mock_flow_instance.run_local_server.assert_called_once_with(port=0) + mock_file.assert_called_with("/path/to/token.json", 'w') + + except ImportError: + pytest.skip("Google Drive dependencies not available") + + @patch('ragas_experimental.backends.gdrive_backend.build') + @patch('os.path.exists') + def test_auth_failure_no_credentials(self, mock_exists, mock_build): + """Test authentication failure when no credentials are provided.""" + mock_exists.return_value = False + + try: + from ragas_experimental.backends.gdrive_backend import GDriveBackend + + with pytest.raises(ValueError, match="No valid authentication method found"): + GDriveBackend( + folder_id="test_folder", + project_id="test_project", + dataset_id="test_dataset", + dataset_name="test_dataset", + type="datasets" + ) + + except ImportError: + pytest.skip("Google Drive dependencies not available") + + +class TestGDriveBackendFolderManagement: + """Test folder structure management in Google Drive.""" + + def _create_mock_backend(self): + """Helper to create a mocked GDriveBackend instance.""" + with patch('ragas_experimental.backends.gdrive_backend.build'): + with patch('ragas_experimental.backends.gdrive_backend.Credentials'): + with patch('os.path.exists', return_value=True): + try: + from ragas_experimental.backends.gdrive_backend import GDriveBackend + return GDriveBackend( + folder_id="test_folder", + project_id="test_project", + dataset_id="test_dataset", + dataset_name="test_dataset", + type="datasets", + service_account_path="/fake/path.json" + ) + except ImportError: + pytest.skip("Google Drive dependencies not available") + + def test_ensure_folder_structure_success(self): + """Test successful folder structure creation.""" + backend = self._create_mock_backend() + if backend is None: + return + + # Mock successful folder operations + backend.drive_service.files().get.return_value.execute.return_value = {"id": "main_folder"} + backend.drive_service.files().list.return_value.execute.side_effect = [ + {"files": []}, # No project folder exists + {"files": []}, # No type folder exists + ] + backend.drive_service.files().create.return_value.execute.side_effect = [ + {"id": "project_folder_id"}, + {"id": "type_folder_id"} + ] + + backend._ensure_folder_structure() + + assert backend.type_folder_id == "type_folder_id" + assert backend.drive_service.files().create.call_count == 2 + + def test_ensure_folder_structure_existing_folders(self): + """Test folder structure with existing folders.""" + backend = self._create_mock_backend() + if backend is None: + return + + backend.drive_service.files().get.return_value.execute.return_value = {"id": "main_folder"} + backend.drive_service.files().list.return_value.execute.side_effect = [ + {"files": [{"id": "existing_project_folder"}]}, # Project folder exists + {"files": [{"id": "existing_type_folder"}]}, # Type folder exists + ] + + backend._ensure_folder_structure() + + assert backend.type_folder_id == "existing_type_folder" + backend.drive_service.files().create.assert_not_called() + + def test_ensure_folder_structure_invalid_main_folder(self): + """Test folder structure with invalid main folder.""" + backend = self._create_mock_backend() + if backend is None: + return + + backend.drive_service.files().get.side_effect = Exception("Not found") + + with pytest.raises(ValueError, match="Folder with ID test_folder not found"): + backend._ensure_folder_structure() + + +class TestGDriveBackendSpreadsheetManagement: + """Test spreadsheet management operations.""" + + def _create_mock_backend_with_dataset(self): + """Helper to create a mocked backend with a dataset.""" + backend = TestGDriveBackendFolderManagement()._create_mock_backend() + if backend is None: + return None + + # Mock a simple dataset + mock_dataset = Mock() + mock_dataset.model = SampleModel + backend.dataset = mock_dataset + backend.type_folder_id = "type_folder_id" + + return backend + + def test_ensure_spreadsheet_create_new(self): + """Test creating a new spreadsheet.""" + backend = self._create_mock_backend_with_dataset() + if backend is None: + return + + # Mock no existing spreadsheet + backend.drive_service.files().list.return_value.execute.return_value = {"files": []} + backend.drive_service.files().create.return_value.execute.return_value = {"id": "new_spreadsheet_id"} + + # Mock sheets service for header initialization + backend.sheets_service.spreadsheets().values().get.return_value.execute.return_value = {"values": []} + backend.sheets_service.spreadsheets().values().clear.return_value.execute.return_value = {} + backend.sheets_service.spreadsheets().values().update.return_value.execute.return_value = {} + + backend._ensure_spreadsheet_exists() + + assert backend.spreadsheet_id == "new_spreadsheet_id" + backend.drive_service.files().create.assert_called_once() + + def test_ensure_spreadsheet_use_existing(self): + """Test using existing spreadsheet.""" + backend = self._create_mock_backend_with_dataset() + if backend is None: + return + + # Mock existing spreadsheet + backend.drive_service.files().list.return_value.execute.return_value = { + "files": [{"id": "existing_spreadsheet_id"}] + } + + # Mock existing headers that match expected + expected_headers = ["_row_id", "name", "value", "description"] + backend.sheets_service.spreadsheets().values().get.return_value.execute.return_value = { + "values": [expected_headers] + } + + backend._ensure_spreadsheet_exists() + + assert backend.spreadsheet_id == "existing_spreadsheet_id" + backend.drive_service.files().create.assert_not_called() + + def test_initialize_spreadsheet_headers(self): + """Test spreadsheet header initialization.""" + backend = self._create_mock_backend_with_dataset() + if backend is None: + return + + backend.spreadsheet_id = "test_spreadsheet" + + # Mock no existing headers + backend.sheets_service.spreadsheets().values().get.return_value.execute.return_value = {"values": []} + backend.sheets_service.spreadsheets().values().clear.return_value.execute.return_value = {} + backend.sheets_service.spreadsheets().values().update.return_value.execute.return_value = {} + + backend._initialize_spreadsheet_headers() + + # Verify headers were set + expected_headers = ["_row_id", "name", "value", "description"] + backend.sheets_service.spreadsheets().values().update.assert_called_once() + call_args = backend.sheets_service.spreadsheets().values().update.call_args + assert call_args[1]["body"]["values"][0] == expected_headers + + +class TestGDriveBackendDataOperations: + """Test data operations (CRUD) on spreadsheets.""" + + def _create_mock_backend_with_spreadsheet(self): + """Helper to create a mocked backend with spreadsheet setup.""" + backend = TestGDriveBackendFolderManagement()._create_mock_backend() + if backend is None: + return None + + backend.spreadsheet_id = "test_spreadsheet" + return backend + + def test_load_entries_success(self): + """Test successful loading of entries from spreadsheet.""" + backend = self._create_mock_backend_with_spreadsheet() + if backend is None: + return + + # Mock spreadsheet data + mock_data = { + "values": [ + ["_row_id", "name", "value", "description"], # Headers + ["row1", "Item 1", "10", "First item"], + ["row2", "Item 2", "20", "Second item"], + ["", "", "", ""], # Empty row should be skipped + ] + } + backend.sheets_service.spreadsheets().values().get.return_value.execute.return_value = mock_data + + entries = backend.load_entries(SampleModel) + + assert len(entries) == 2 + assert entries[0].name == "Item 1" + assert entries[0].value == 10 + assert entries[0]._row_id == "row1" + assert entries[1].name == "Item 2" + assert entries[1].value == 20 + + def test_load_entries_empty_spreadsheet(self): + """Test loading entries from empty spreadsheet.""" + backend = self._create_mock_backend_with_spreadsheet() + if backend is None: + return + + backend.sheets_service.spreadsheets().values().get.return_value.execute.return_value = {"values": []} + + entries = backend.load_entries(SampleModel) + + assert entries == [] + + def test_append_entry_success(self): + """Test successful entry appending.""" + backend = self._create_mock_backend_with_spreadsheet() + if backend is None: + return + + backend.sheets_service.spreadsheets().values().append.return_value.execute.return_value = {} + + entry = SampleModel(name="New Item", value=30, description="New description") + row_id = backend.append_entry(entry) + + assert isinstance(row_id, str) + assert len(row_id) > 0 + backend.sheets_service.spreadsheets().values().append.assert_called_once() + + def test_update_entry_success(self): + """Test successful entry update.""" + backend = self._create_mock_backend_with_spreadsheet() + if backend is None: + return + + # Mock finding the row + backend.sheets_service.spreadsheets().values().get.return_value.execute.return_value = { + "values": [["existing_row_id"], ["other_row_id"]] + } + backend.sheets_service.spreadsheets().values().update.return_value.execute.return_value = {} + + entry = SampleModel(name="Updated Item", value=40, description="Updated description") + entry._row_id = "existing_row_id" + + result = backend.update_entry(entry) + + assert result is True + backend.sheets_service.spreadsheets().values().update.assert_called_once() + + def test_update_entry_not_found_appends(self): + """Test update entry creates new entry when row not found.""" + backend = self._create_mock_backend_with_spreadsheet() + if backend is None: + return + + # Mock not finding the row + backend.sheets_service.spreadsheets().values().get.return_value.execute.return_value = { + "values": [["other_row_id"]] + } + backend.sheets_service.spreadsheets().values().append.return_value.execute.return_value = {} + + entry = SampleModel(name="New Item", value=50, description="New description") + entry._row_id = "nonexistent_row_id" + + result = backend.update_entry(entry) + + # Should append since row not found + backend.sheets_service.spreadsheets().values().append.assert_called_once() + + def test_delete_entry_success(self): + """Test successful entry deletion.""" + backend = self._create_mock_backend_with_spreadsheet() + if backend is None: + return + + # Mock finding the row + backend.sheets_service.spreadsheets().values().get.return_value.execute.return_value = { + "values": [["row_to_delete"], ["other_row"]] + } + backend.sheets_service.spreadsheets().batchUpdate.return_value.execute.return_value = {} + + result = backend.delete_entry("row_to_delete") + + assert result is True + backend.sheets_service.spreadsheets().batchUpdate.assert_called_once() + + def test_delete_entry_not_found(self): + """Test delete entry when row not found.""" + backend = self._create_mock_backend_with_spreadsheet() + if backend is None: + return + + # Mock not finding the row + backend.sheets_service.spreadsheets().values().get.return_value.execute.return_value = { + "values": [["other_row"]] + } + + result = backend.delete_entry("nonexistent_row") + + assert result is False + backend.sheets_service.spreadsheets().batchUpdate.assert_not_called() + + def test_get_entry_by_field_success(self): + """Test finding entry by field value.""" + backend = self._create_mock_backend_with_spreadsheet() + if backend is None: + return + + # Mock load_entries to return test data + mock_entries = [ + SampleModel(name="Item 1", value=10, description="First item"), + SampleModel(name="Item 2", value=20, description="Second item"), + ] + mock_entries[0]._row_id = "row1" + mock_entries[1]._row_id = "row2" + + with patch.object(backend, 'load_entries', return_value=mock_entries): + entry = backend.get_entry_by_field("name", "Item 1", SampleModel) + + assert entry is not None + assert entry.name == "Item 1" + assert entry.value == 10 + + def test_get_entry_by_field_not_found(self): + """Test finding entry by field value when not found.""" + backend = self._create_mock_backend_with_spreadsheet() + if backend is None: + return + + with patch.object(backend, 'load_entries', return_value=[]): + entry = backend.get_entry_by_field("name", "Nonexistent", SampleModel) + + assert entry is None + + +class TestGDriveBackendIntegration: + """Test integration with Project and Dataset classes.""" + + def test_project_create_gdrive_params(self): + """Test that Project.create accepts Google Drive parameters.""" + try: + with patch('ragas_experimental.backends.gdrive_backend.build'): + with patch('ragas_experimental.backends.gdrive_backend.Credentials'): + with patch('os.path.exists', return_value=True): + project = Project.create( + name="test_project", + backend="gdrive", + gdrive_folder_id="test_folder", + gdrive_service_account_path="fake_path.json" + ) + assert project.backend == "gdrive" + assert project._gdrive_folder_id == "test_folder" + except ImportError: + pytest.skip("Google Drive dependencies not available") + + def test_project_gdrive_validation(self): + """Test that Project validates required Google Drive parameters.""" + with pytest.raises(ValueError, match="gdrive_folder_id is required"): + Project.create( + name="test_project", + backend="gdrive" + # Missing gdrive_folder_id + ) + + def test_get_column_mapping(self): + """Test get_column_mapping returns model fields.""" + backend = TestGDriveBackendFolderManagement()._create_mock_backend() + if backend is None: + return + + mapping = backend.get_column_mapping(SampleModel) + + # Should return the model fields directly + assert mapping == SampleModel.model_fields + + def test_str_and_repr(self): + """Test string representations of backend.""" + backend = TestGDriveBackendFolderManagement()._create_mock_backend() + if backend is None: + return + + str_repr = str(backend) + assert "GDriveBackend" in str_repr + assert "test_folder" in str_repr + assert "test_project" in str_repr + + assert str(backend) == repr(backend) + + +if __name__ == "__main__": + pytest.main([__file__]) diff --git a/requirements/gdrive.txt b/requirements/gdrive.txt new file mode 100644 index 000000000..5e90ca32a --- /dev/null +++ b/requirements/gdrive.txt @@ -0,0 +1,4 @@ +# Optional dependencies for Google Drive backend +google-api-python-client>=2.0.0 +google-auth>=2.0.0 +google-auth-oauthlib>=1.0.0