Name	Name	Last commit message	Last commit date
parent directory ..
atlas	atlas
multi_modality	multi_modality
result_analysis	result_analysis
single_modality	single_modality
spatial	spatial
tuning	tuning
dataset_server.json	dataset_server.json
heart_ans.csv	heart_ans.csv
readme.md	readme.md

How to Add New Algorithms to the Auto-Search Framework

This document explains how to integrate new algorithms into the project's automatic search framework.

Implementation Requirements

1. Inherit Base Classes

New algorithms should inherit from appropriate base classes in dance.modules.base:

from dance.modules.base import (
    BaseClusteringMethod,    # Clustering algorithms
    BaseClassificationMethod,  # Classification algorithms
    BaseRegressionMethod,    # Regression algorithms
    TorchNNPretrain         # If pretraining is needed
)

class YourMethod(BaseClusteringMethod):
    """Your method description."""
    pass

2. Implement Required Interfaces

All algorithms must implement these core interfaces:

def preprocessing_pipeline(**kwargs) -> BaseTransform:
    """Define data preprocessing pipeline"""
    ...

def fit(self, x, y=None, **kwargs):
    """Train the model

    Parameters
    ----------
    x : array-like
        Input features
    y : array-like, optional
        Labels (required for supervised learning)
    """
    ...

def predict(self, x):
    """Predict results

    Parameters
    ----------
    x : array-like
        Input features

    Returns
    -------
    array-like
        Prediction results
    """
    ...

The base class provides a default implementation of the score() method, which:

Calls predict() to get prediction results
Calculates scores using predefined evaluation metrics
- Clustering algorithms: ARI (Adjusted Rand Index)
- Classification algorithms: Accuracy
- Regression algorithms: MSE (Mean Squared Error)

Directory Structure

New algorithms should follow this directory structure:

    examples/tuning/
    └── [task_name]_[algorithm_name]/
        ├── main.py # Main execution file
        └── [dataset_name]/ # Dataset related configurations
            ├── pipeline_params_tuning_config.yaml
            └── config_yamls
                ├── 0_test_acc_params_tuning_config.yaml
                ├── 1_test_acc_params_tuning_config.yaml
                └── 2_test_acc_params_tuning_config.yaml

Integration Steps

1. Create Algorithm Directory

Create a new algorithm directory under examples/tuning/, named as [task_name]_[algorithm_name].

2. Implement Main Execution File

Create main.py in the algorithm directory with these key components:

Parameter Configuration

parser = argparse.ArgumentParser()
# Add necessary parameters
parser.add_argument("--data_dir", default="../temp_data"，help="Directory path containing the input data files")
parser.add_argument("--dataset", type=str, choices=[...]，help="Dataset name")
parser.add_argument("--tune_mode", default="pipeline_params",
                   choices=["pipeline", "params", "pipeline_params"]， help="Tuning mode: 'pipeline' for pipeline tuning only, 'params' for parameter tuning only, 'pipeline_params' for both")
parser.add_argument("--sweep_id", type=str, default=None，help="Existing sweep ID to resume. If None, creates a new sweep")
parser.add_argument("--count", type=int, default=2，help="Number of times to run the sweep agent")
parser.add_argument(
        "--summary_file_path",
        default="results/pipeline/best_test_acc.csv",
        type=str,
        help="Path to save the summary results file"
    )
parser.add_argument(
        "--root_path",
        default=str(Path(__file__).resolve().parent),
        type=str,
        help="Root directory path for saving results and configuration files"
    )
# ... other model-specific parameters ...
args = parser.parse_args()

Evaluation Function Definition

def evaluate_pipeline(tune_mode, pipeline_planer):
    # Initialize wandb
    wandb.init(settings=wandb.Settings(start_method='thread'))

    # Load data according to the task
    dataloader = TaskDataset(args.data_dir, args.dataset)
    data = dataloader.load_data(cache=args.cache)

    # Apply preprocessing pipeline
    kwargs = {tune_mode: dict(wandb.config)}
    preprocessing_pipeline = pipeline_planer.generate(**kwargs)
    preprocessing_pipeline(data)

    # Get processed data
    inputs, y = data.get_data(return_type="default")

    # Initialize and train model
    model = YourModel(model_params)
    model.fit(inputs, y)

    # Evaluate and log results
    score = model.score(None, y)
    wandb.log({"acc": score})
    wandb.finish()

Main Program Flow

if __name__ == "__main__":
    # Initialize pipeline planer
    pipeline_planer = PipelinePlaner.from_config_file(
        f"{file_root_path}/{args.tune_mode}_tuning_config.yaml")

    # Run hyperparameter search
    entity, project, sweep_id = pipeline_planer.wandb_sweep_agent(
        evaluate_pipeline,
        sweep_id=args.sweep_id,
        count=args.count
    )

    # Save results
    save_summary_data(
        entity, project, sweep_id,
        summary_file_path=args.summary_file_path,
        root_path=file_root_path
    )
    if args.tune_mode == "pipeline" or args.tune_mode == "pipeline_params":
      #generate step3_default_params.yaml
        get_step3_yaml(result_load_path=f"{args.summary_file_path}", step2_pipeline_planer=pipeline_planer,
                       conf_load_path=f"{Path(args.root_path).resolve().parent}/step3_default_params.yaml",
                       root_path=file_root_path, required_funs=["SaveRaw", "UpdateRaw", "NeighborGraph", "SetConfig"],
                       required_indexes=[2, 5, sys.maxsize - 1, sys.maxsize], metric="acc")
        if args.tune_mode == "pipeline_params":
            #run step3
            run_step3(file_root_path, evaluate_pipeline, tune_mode="params", step2_pipeline_planer=pipeline_planer)

3. Configuration File Setup

Create corresponding configuration files in the dataset directory to guide the hyperparameter search process.

Configuration File Types

pipeline_params_tuning_config.yaml: Main configuration file for joint search
config_yamls/*.yaml: Parameter search configuration files automatically generated by the system

Search Modes Explanation

The system supports three search modes (specified by tune_mode):

pipeline mode
- Only searches for optimal preprocessing pipeline combinations
- Uses pipeline_tuning_config.yaml
params mode
- Only searches for optimal model parameter combinations
- Uses params_tuning_config.yaml
pipeline_params mode
- Performs two-stage joint search
- First stage: searches for optimal preprocessing pipeline
- Second stage: searches for optimal model parameters based on the best pipeline
- System automatically generates parameter search config files (e.g., config_yamls/0_test_acc_params_tuning_config.yaml)

Configuration file example:

# pipeline_params_tuning_config.yaml
type: preprocessor
tune_mode: pipeline_params
pipeline_tuning_top_k: 3 #topk for pipeline tuning to use parameter tuning
parameter_tuning_freq_n: 20 #frequency for parameter tuning
pipeline:
  - type: filter.gene
    include:
      - FilterGenesPercentile
      - FilterGenesScanpyOrder
      - FilterGenesPlaceHolder
    default_params:
      FilterGenesScanpyOrder:
        order: ["min_counts", "min_cells", "max_counts", "max_cells"]
        min_counts: 0.01
        max_counts: 0.99
        min_cells: 0.01
        max_cells: 0.99
  - type: feature.cell
    include:
      - WeightedFeaturePCA
      - WeightedFeatureSVD
      - CellPCA
      - CellSVD
      - GaussRandProjFeature  # Registered custom preprocessing func
      - FeatureCellPlaceHolder
    params:
      out: feature.cell
      log_level: INFO
    default_params:
      WeightedFeaturePCA:
        split_name: train
      WeightedFeatureSVD:
        split_name: train
  - type: misc
    target: SetConfig
    params:
      config_dict:
        feature_channel: feature.cell
        label_channel: cell_type
wandb:
  entity: xxxxx
  project: xxxxx
  method: grid #try grid to provide a comprehensive search
  metric:
    name: acc  # val/acc
    goal: maximize

4. Run Tests

After integration, test using these commands:

    # Search preprocessing pipeline only
    python main.py --tune_mode pipeline
    # Search model parameters only
    python main.py --tune_mode params
    # Joint search
    python main.py --tune_mode pipeline_params

Notes

Ensure the model implements fit() and score() interfaces
wandb configuration should correspond to model parameters
Recommend testing on small datasets first

Examples

Refer to examples/tuning/cluster_graphsc and examples/tuning/cta_celltypist implementations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

How to Add New Algorithms to the Auto-Search Framework

Implementation Requirements

1. Inherit Base Classes

2. Implement Required Interfaces

Directory Structure

Integration Steps

1. Create Algorithm Directory

2. Implement Main Execution File

3. Configuration File Setup

Configuration File Types

Search Modes Explanation

4. Run Tests

Notes

Examples

FilesExpand file tree

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

readme.md

How to Add New Algorithms to the Auto-Search Framework

Implementation Requirements

1. Inherit Base Classes

2. Implement Required Interfaces

Directory Structure

Integration Steps

1. Create Algorithm Directory

2. Implement Main Execution File

3. Configuration File Setup

Configuration File Types

Search Modes Explanation

4. Run Tests

Notes

Examples