LegacyService Testing Plan

Overview

This document outlines a comprehensive plan to extend test coverage for the LegacyService in the OmniPath server. The goal is to ensure the service correctly handles all HTTP API parameters before production deployment.

Current Status: 15 test scenarios Target: 60+ test scenarios Parameter Coverage: ~30% → ~85% Data Validation: 4 checks → 50+ checks

Current Test Coverage Summary

The scripts/r-legacy-server-tests.R script currently covers all 5 query types:

interactions (8 scenarios) - Best coverage
intercell (2 scenarios)
annotations (2 scenarios)
enzyme_substrate (2 scenarios)
complexes (1 scenario) - Minimal coverage

Key Gaps Identified

Limited parameter coverage - Many API parameters are untested
Weak data validation - Only 4 scenarios have check functions
Missing edge cases - No limit testing, format variations, or error conditions
Insufficient field testing - Field selection parameter barely tested
No protein-specific queries - The proteins parameter is never used

Test Extension Plan

Phase 1: Enhanced Data Validation (Priority: HIGH)

Goal: Add robust check functions to all existing and new scenarios

Reusable validation helpers to add:

# Column existence validation
check_columns_exist <- function(result, required_cols) {
  all(required_cols %in% names(result))
}

# Column type validation
check_column_types <- function(result, type_map) {
  all(map2_lgl(names(type_map), type_map,
    ~inherits(result[[.x]], .y)))
}

# Organism filtering validation
check_organism_filter <- function(result, expected_organism) {
  tax_cols <- names(result)[str_detect(names(result), 'ncbi_tax_id')]
  if (length(tax_cols) == 0) return(TRUE)
  all(map_lgl(tax_cols, ~all(result[[.x]] == expected_organism)))
}

# Boolean column validation
check_boolean_column <- function(result, col_name, expected_value) {
  all(result[[col_name]] == expected_value)
}

# Resource filtering validation
check_resource_filter <- function(result, resource_name) {
  any(str_detect(result$sources, resource_name) |
      str_detect(result$resources, resource_name))
}

# Non-empty result validation
check_has_rows <- function(result, min_rows = 1) {
  nrow(result) >= min_rows
}

# Field selection validation
check_only_requested_fields <- function(result, requested_fields, base_fields = c('uniprot', 'genesymbol')) {
  allowed_fields <- c(base_fields, requested_fields)
  extra_fields <- setdiff(names(result), allowed_fields)
  length(extra_fields) == 0
}

Actions:

Add column validation checks to verify expected columns exist
Add data type validation for numeric/boolean/character columns
Add value range checks for organism IDs, booleans, enum fields
Add non-empty result checks for queries expected to return data
Add relationship validation to verify filters were applied correctly

Phase 2: Interactions Query Extension (Priority: HIGH)

Current: 8 scenarios → Target: 15+ scenarios

New Test Scenarios

interactions_sources_filter
- Test filtering by multiple specific sources
- Args: sources = c('BioGRID', 'IntAct'), datasets = 'omnipath'
- Check: Verify all rows have one of the specified sources
interactions_extra_attrs
- Validate extra_attrs parameter returns additional resource-specific columns
- Args: resources = 'SIGNOR', extra_attrs = TRUE
- Check: Verify presence of resource-specific attribute columns
interactions_unsigned
- Test signed=FALSE filtering for unsigned interactions
- Args: datasets = 'pathwayextra', signed = FALSE
- Check: Verify no is_stimulation/is_inhibition columns or all FALSE
interactions_undirected
- Test directed=FALSE for undirected interactions
- Args: datasets = 'omnipath', directed = FALSE
- Check: Verify is_directed column is FALSE
interactions_limit
- Test SQL LIMIT clause functionality
- Args: datasets = 'omnipath', limit = 100
- Check: Verify exactly 100 or fewer rows returned
interactions_fields_comprehensive
- Test multiple field combinations
- Args: fields = c('sources', 'references', 'curation_effort', 'databases')
- Check: Verify all requested fields are present
interactions_protein_specific
- Test with specific protein identifiers
- Args: sources = c('TP53', 'MDM2'), datasets = 'omnipath'
- Check: Verify all rows contain TP53 or MDM2
interactions_pathwayextra
- Test pathwayextra dataset
- Args: datasets = 'pathwayextra', organisms = 9606
- Check: Verify dataset field contains 'pathwayextra'
interactions_ligrecextra
- Test ligrecextra dataset with ligand-receptor interactions
- Args: datasets = 'ligrecextra', fields = 'type'
- Check: Verify type field indicates ligand-receptor relationships
interactions_kinaseextra
- Test kinase interactions
- Args: datasets = 'kinaseextra', types = 'post_translational'
- Check: Verify post-translational modification relationships
interactions_tfregulons
- Test tfregulons dataset
- Args: datasets = 'tfregulons', types = 'transcriptional'
- Check: Verify transcriptional regulation type
interactions_lncrna
- Test lncRNA-mRNA interactions
- Args: datasets = 'lncrna_mrna', entity_types = 'lncrna'
- Check: Verify lncRNA entity types present
interactions_multi_organism
- Test organism parameter validation
- Args: organisms = c(9606, 10090), datasets = 'omnipath'
- Check: Verify organism filtering works correctly
interactions_entity_combinations
- Test complex+protein entity types
- Args: entity_types = c('protein', 'complex'), datasets = 'omnipath'
- Check: Verify both entity types appear in results

Phase 3: Intercell Query Extension (Priority: HIGH)

Current: 2 scenarios → Target: 10+ scenarios

New Test Scenarios

intercell_receiver
- Test receiver=TRUE with receptor categories
- Args: receiver = TRUE, categories = 'receptor'
- Check: Verify receiver column is TRUE
intercell_causality
- Test causality parameter filtering
- Args: causality = 'transmitter', categories = 'ligand'
- Check: Verify transmitter-specific results
intercell_source_composite
- Test source=composite vs resource_specific
- Args: source = 'composite', categories = 'ligand'
- Check: Compare with resource_specific results
intercell_parent_categories
- Test parent category filtering
- Args: parent = 'adhesion'
- Check: Verify hierarchical category filtering
intercell_proteins
- Test specific protein queries
- Args: proteins = c('EGFR', 'ERBB2'), categories = 'receptor'
- Check: Verify only requested proteins appear
intercell_fields
- Test field selection
- Args: fields = c('sources', 'databases'), categories = 'ligand'
- Check: Verify requested fields are present
intercell_multiple_categories
- Test category combinations
- Args: categories = c('ligand', 'receptor')
- Check: Verify both categories present in results
intercell_all_topologies
- Test topology filters individually
- Args: topology = 'secreted', categories = 'ligand'
- Check: Verify secreted topology
intercell_transmitter_secreted
- Test transmitter with secreted flag
- Args: transmitter = TRUE, secreted = TRUE
- Check: Verify both conditions met
intercell_generic_scope
- Test scope=generic with categories
- Args: scope = 'generic', aspect = 'functional'
- Check: Verify generic vs specific scope differences
intercell_locational_aspect
- Test aspect=locational
- Args: aspect = 'locational', topology = 'plasma_membrane_transmembrane'
- Check: Verify locational aspect results
intercell_limit
- Test SQL LIMIT functionality
- Args: categories = 'ligand', limit = 50
- Check: Verify 50 or fewer rows returned

Phase 4: Enzyme-Substrate Query Extension (Priority: MEDIUM)

Current: 2 scenarios → Target: 8+ scenarios

New Test Scenarios

enzsub_specific_enzyme
- Test enzymes parameter with specific kinases
- Args: enzymes = c('AKT1', 'MAPK1'), modification = 'phosphorylation'
- Check: Verify enzyme column contains only specified enzymes
enzsub_specific_substrate
- Test substrates parameter
- Args: substrates = c('TP53', 'RB1'), modification = 'phosphorylation'
- Check: Verify substrate column contains only specified substrates
enzsub_partners
- Test partners parameter (either enzyme or substrate)
- Args: partners = c('AKT1', 'TP53')
- Check: Verify partners appear as either enzyme or substrate
enzsub_enzyme_substrate_OR
- Test enzyme_substrate=OR logic
- Args: enzymes = 'AKT1', substrates = 'TP53', enzyme_substrate = 'OR'
- Check: Verify OR logic (either condition satisfied)
enzsub_multiple_residues
- Test residues with multiple values
- Args: residues = c('S', 'T', 'Y'), modification = 'phosphorylation'
- Check: Verify residue column contains only S, T, or Y
enzsub_modification_types
- Test different modification types
- Args: modification = 'ubiquitination', organisms = 9606
- Check: Verify modification column contains ubiquitination
enzsub_fields
- Test field selection
- Args: fields = c('curation_effort', 'references', 'sources')
- Check: Verify requested fields are present
enzsub_multiple_resources
- Test resource combination queries
- Args: resources = c('PhosphoSite', 'SIGNOR')
- Check: Verify resources column contains specified resources
enzsub_organism_mice
- Test mouse organism filtering
- Args: organisms = 10090, modification = 'phosphorylation'
- Check: Verify organism filtering
enzsub_limit
- Test SQL LIMIT functionality
- Args: modification = 'phosphorylation', limit = 100
- Check: Verify 100 or fewer rows returned

Phase 5: Annotations Query Extension (Priority: MEDIUM)

Current: 2 scenarios → Target: 8+ scenarios

New Test Scenarios

annotations_protein_specific
- Test proteins parameter
- Args: proteins = c('TP53', 'EGFR'), resources = 'UniProt_keyword'
- Check: Verify only requested proteins appear
annotations_fields
- Test field selection
- Args: fields = c('value', 'source'), resources = 'UniProt_tissue'
- Check: Verify only requested fields present
annotations_multiple_resources
- Test resource combinations
- Args: resources = c('UniProt_tissue', 'UniProt_keyword')
- Check: Verify both resources present
annotations_entity_types
- Test different entity types
- Args: entity_types = 'mirna', resources = 'miRBase' (if available)
- Check: Verify entity type filtering
annotations_subcellular
- Test UniProt subcellular location
- Args: resources = 'UniProt_location', genesymbols = TRUE
- Check: Verify subcellular location annotations
annotations_go_terms
- Test GO annotations (if available)
- Args: resources = 'GO_' (check available GO resources)
- Check: Verify GO term structure
annotations_pathway
- Test pathway annotations
- Args: resources = c('KEGG', 'Reactome') (if available)
- Check: Verify pathway annotation format
annotations_limit
- Test SQL LIMIT functionality
- Args: resources = 'UniProt_keyword', limit = 100
- Check: Verify 100 or fewer rows returned

Phase 6: Complexes Query Extension (Priority: MEDIUM)

Current: 1 scenario → Target: 6+ scenarios

New Test Scenarios

complexes_corum
- Test CORUM database
- Args: resources = 'CORUM'
- Check: Verify CORUM resource in results
complexes_complexportal
- Test ComplexPortal database
- Args: resources = 'ComplexPortal'
- Check: Verify ComplexPortal annotations
complexes_proteins
- Test protein-specific complex queries
- Args: proteins = c('TP53', 'MDM2')
- Check: Verify complexes containing specified proteins
complexes_fields
- Test field selection
- Args: resources = 'CORUM', fields = c('sources', 'databases')
- Check: Verify requested fields present
complexes_multiple_resources
- Test resource combinations
- Args: resources = c('CORUM', 'ComplexPortal', 'hu.MAP')
- Check: Verify all resources present
complexes_cellphonedb
- Test CellPhoneDB complexes
- Args: resources = 'CellPhoneDB'
- Check: Verify CellPhoneDB complex annotations
- Tags: full-db
complexes_limit
- Test SQL LIMIT functionality
- Args: resources = 'hu.MAP', limit = 50
- Check: Verify 50 or fewer rows returned

Phase 7: Cross-Cutting Concerns (Priority: MEDIUM)

Actions for ALL query types:

Limit parameter testing
- Add limit parameter to representative scenarios
- Verify SQL LIMIT clause correctly restricts result count
- Note: This is NOT pagination, just SQL query limiting
Format testing
- Test format=tsv on select scenarios
- Verify TSV parsing works correctly with OmnipathR
- Compare TSV vs JSON results for consistency
Empty result handling
- Create scenarios with filters that return no results
- Verify graceful handling of empty data frames
- Check for proper error messages vs empty results
Large result sets
- Test queries without filters (tag as 'full-db')
- Monitor performance and memory usage
- Verify data integrity on large responses
Field selection validation
- Verify only requested fields are returned
- Test field combinations across query types
- Validate special fields like 'evidences' (JSON)
genesymbols parameter
- Test TRUE vs FALSE consistently across query types
- Verify gene symbol columns appear/disappear correctly
- Check UniProt ID vs gene symbol consistency

What we find important from the points above: -- Empty result handling -- Test JSON format -- Test for error messages -- Factor out repetative checks into small helper functions

Phase 8: Enhanced Check Functions (Priority: HIGH)

Implementation plan for validation helpers:

Add helper functions section after the parse_bool function
Use helpers consistently across all scenarios
Create test-specific validators for complex checks
Document validation logic with comments

Example enhanced scenario:

list(
    id = 'interactions_comprehensive_check',
    query = 'omnipath_interactions',
    description = 'Comprehensive validation of interaction query',
    args = list(
        organisms = 9606,
        datasets = 'omnipath',
        resources = 'SIGNOR',
        genesymbols = TRUE,
        fields = c('sources', 'references', 'is_directed', 'is_stimulation')
    ),
    check = function(result) {
        check_columns_exist(result, c('source', 'target', 'sources', 'references')) &&
        check_columns_exist(result, c('source_genesymbol', 'target_genesymbol')) &&
        check_resource_filter(result, 'SIGNOR') &&
        check_has_rows(result, min_rows = 1) &&
        check_boolean_column(result, 'is_directed', TRUE)
    },
    tags = c('smoke', 'validation')
)

Phase 9: Error Condition Testing (Priority: LOW)

Goal: Verify graceful error handling

Invalid parameters
- Test with invalid organism IDs (e.g., organisms = 9999999)
- Test with non-existent resources
- Verify appropriate error messages
Malformed queries
- Test with incorrect parameter types (if not handled by OmnipathR)
- Test with out-of-range values
Empty filters
- Test contradictory parameter combinations
- Verify empty result vs error distinction
Missing required params
- Identify any required parameters
- Test behavior when omitted

Implementation Strategy

Quick Wins (Week 1)

Add validation helper functions to the script
Add check functions to all existing 15 scenarios
Implement 5 high-priority interactions scenarios
Implement 5 high-priority intercell scenarios
Run test suite and fix any failures

Medium-term (Week 2-3)

Complete all Phase 2 interactions extensions (15+ scenarios)
Complete all Phase 3 intercell extensions (10+ scenarios)
Implement Phase 4 enzsub scenarios (8+ scenarios)
Implement Phase 5 annotations scenarios (8+ scenarios)
Add cross-cutting limit tests

Long-term (Week 4+)

Complete Phase 6 complexes scenarios (6+ scenarios)
Add cross-cutting tests (format, empty results, large datasets)
Implement Phase 9 error condition tests
Performance testing for large queries
Documentation of test patterns and conventions

Test Execution Strategy

Enhanced Tag System

smoke - Fast, critical path tests (5-8 scenarios, < 30 seconds total)
core - Standard test suite (30-40 scenarios, < 5 minutes total)
comprehensive - All parameter combinations (60+ scenarios, < 15 minutes)
full-db - Tests requiring complete database (may be slow)
validation - Tests with complex check functions
edge-case - Error conditions and boundaries

Command-line Usage Examples

# Quick validation (smoke tests only)
Rscript scripts/r-legacy-server-tests.R

# Core test suite
Rscript scripts/r-legacy-server-tests.R --scenario=*_core

# With full database
OMNIPATH_FULL_DB=1 Rscript scripts/r-legacy-server-tests.R

# Specific query type
Rscript scripts/r-legacy-server-tests.R --scenario=interactions_*

# List all available scenarios
Rscript scripts/r-legacy-server-tests.R --list-scenarios

# Run specific scenario by ID
Rscript scripts/r-legacy-server-tests.R --scenario=interactions_basic

CI/CD Integration

Pre-commit hook - Run smoke tests (< 30 seconds)
Pull request checks - Run core tests (< 5 minutes)
Nightly builds - Run comprehensive + full-db tests
Release validation - Full test suite with error condition tests

Expected Outcomes

Metric	Current	Target
Total test scenarios	15	60+
Parameter coverage	~30%	~85%
Data validation checks	4	50+
Query types with 10+ tests	0	3 (interactions, intercell, enzsub)
Test execution time (core)	~2 min	~5 min
Confidence for production	Medium	High

Success Criteria for Production Deployment

Before deploying to production, ensure:

Test Coverage
- All 5 query types have at least 6 test scenarios
- All major API parameters are tested at least once
- At least 50% of scenarios have validation checks
Test Results
- 100% of smoke tests pass
- 95%+ of core tests pass
- All critical parameters validated
Data Integrity
- Validation checks confirm correct filtering
- Field selection works correctly
- Organism filtering validated
- Resource filtering validated
Performance
- Core test suite completes in < 5 minutes
- No memory issues with large queries
- No timeout issues with complex queries
Compatibility
- OmnipathR client handles all responses correctly
- Results match old API (for regression testing)
- Format variations (JSON/TSV) work correctly

Notes and Considerations

Limit parameter: Not pagination, just SQL LIMIT clause - useful for testing but not for production pagination needs
Full-db scenarios: Should be clearly tagged and skippable for fast iteration
Cache management: Reset OmnipathR cache between test runs to avoid false positives
Trace logging: Use OmnipathR:::.optrace() for debugging
Parallel execution: Consider parallelizing independent scenarios for faster CI/CD
Resource discovery: Use metadata endpoints to discover available resources for test scenarios

References

API Documentation: https://r.omnipathdb.org/articles/bioc_workshop.html
Query Parameters: https://omnipathdb.org/queries/{query_type}
OmnipathR Package: https://github.com/saezlab/OmnipathR
Project README: README.md
Contributor Guide: AGENTS.md
Current Test Script: scripts/r-legacy-server-tests.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LegacyService Testing Plan

Overview

Current Test Coverage Summary

Key Gaps Identified

Test Extension Plan

Phase 1: Enhanced Data Validation (Priority: HIGH)

Phase 2: Interactions Query Extension (Priority: HIGH)

New Test Scenarios

Phase 3: Intercell Query Extension (Priority: HIGH)

New Test Scenarios

Phase 4: Enzyme-Substrate Query Extension (Priority: MEDIUM)

New Test Scenarios

Phase 5: Annotations Query Extension (Priority: MEDIUM)

New Test Scenarios

Phase 6: Complexes Query Extension (Priority: MEDIUM)

New Test Scenarios

Phase 7: Cross-Cutting Concerns (Priority: MEDIUM)

Phase 8: Enhanced Check Functions (Priority: HIGH)

Phase 9: Error Condition Testing (Priority: LOW)

Implementation Strategy

Quick Wins (Week 1)

Medium-term (Week 2-3)

Long-term (Week 4+)

Test Execution Strategy

Enhanced Tag System

Command-line Usage Examples

CI/CD Integration

Expected Outcomes

Success Criteria for Production Deployment

Notes and Considerations

References

FilesExpand file tree

TESTING_PLAN.md

Latest commit

History

TESTING_PLAN.md

File metadata and controls

LegacyService Testing Plan

Overview

Current Test Coverage Summary

Key Gaps Identified

Test Extension Plan

Phase 1: Enhanced Data Validation (Priority: HIGH)

Phase 2: Interactions Query Extension (Priority: HIGH)

New Test Scenarios

Phase 3: Intercell Query Extension (Priority: HIGH)

New Test Scenarios

Phase 4: Enzyme-Substrate Query Extension (Priority: MEDIUM)

New Test Scenarios

Phase 5: Annotations Query Extension (Priority: MEDIUM)

New Test Scenarios

Phase 6: Complexes Query Extension (Priority: MEDIUM)

New Test Scenarios

Phase 7: Cross-Cutting Concerns (Priority: MEDIUM)

Phase 8: Enhanced Check Functions (Priority: HIGH)

Phase 9: Error Condition Testing (Priority: LOW)

Implementation Strategy

Quick Wins (Week 1)

Medium-term (Week 2-3)

Long-term (Week 4+)

Test Execution Strategy

Enhanced Tag System

Command-line Usage Examples

CI/CD Integration

Expected Outcomes

Success Criteria for Production Deployment

Notes and Considerations

References