This plan outlines the implementation of MPI-based cluster information collection for the MLPerf Storage benchmark. The feature will collect system information (/proc/meminfo, /proc/cpuinfo, /proc/diskstats, and other relevant data) from all hosts in a distributed cluster using MPI, making it available for rules checking before and after benchmark execution.
- MPI Support:
generate_mpi_prefix_cmd()inutils.pygenerates MPI command prefixes - Data Classes:
HostMemoryInfo,HostCPUInfo,HostInfo,ClusterInformationexist inrules.py - Parsing Support:
HostMemoryInfo.from_proc_meminfo_dict()already handles/proc/meminfoformat - Metadata Storage:
Benchmark.write_metadata()serializes to JSON usingMLPSJsonEncoder
DLIOBenchmark.accumulate_host_info()uses only CLI argclient_host_memory_in_gb(single value for all hosts)- No actual collection of real system information from hosts
- No disk statistics collection
- No validation that all nodes have consistent configurations
New File: mlpstorage/cluster_collector.py
This module will handle all MPI-based data collection.
@dataclass
class HostDiskInfo:
"""Disk statistics for a host from /proc/diskstats"""
device_name: str
reads_completed: int
reads_merged: int
sectors_read: int
time_reading_ms: int
writes_completed: int
writes_merged: int
sectors_written: int
time_writing_ms: int
ios_in_progress: int
time_doing_ios_ms: int
weighted_time_doing_ios_ms: int
# Optional newer fields (kernel 4.18+)
discards_completed: Optional[int] = None
discards_merged: Optional[int] = None
sectors_discarded: Optional[int] = None
time_discarding_ms: Optional[int] = None
flush_requests_completed: Optional[int] = None
time_flushing_ms: Optional[int] = None
@dataclass
class HostNetworkInfo:
"""Network interface statistics from /proc/net/dev"""
interface_name: str
rx_bytes: int
rx_packets: int
rx_errors: int
tx_bytes: int
tx_packets: int
tx_errors: int
@dataclass
class HostSystemInfo:
"""Extended system information for a host"""
hostname: str
kernel_version: str # from /proc/version
os_release: Dict[str, str] # from /etc/os-release
uptime_seconds: float # from /proc/uptime
load_average: Tuple[float, float, float] # from /proc/loadavgCreate parsing functions for each /proc file:
def parse_proc_meminfo(content: str) -> Dict[str, int]:
"""Parse /proc/meminfo content into a dictionary (values in kB)"""
def parse_proc_cpuinfo(content: str) -> List[Dict[str, Any]]:
"""Parse /proc/cpuinfo content into list of CPU dictionaries"""
def parse_proc_diskstats(content: str) -> List[HostDiskInfo]:
"""Parse /proc/diskstats content into list of disk info"""
def parse_proc_net_dev(content: str) -> List[HostNetworkInfo]:
"""Parse /proc/net/dev content into list of network info"""
def parse_proc_version(content: str) -> str:
"""Parse /proc/version to extract kernel version"""
def parse_proc_loadavg(content: str) -> Tuple[float, float, float]:
"""Parse /proc/loadavg to extract load averages"""def collect_local_system_info() -> Dict[str, Any]:
"""
Collect system information from the local node.
Returns a dictionary containing:
- hostname: str
- meminfo: Dict from /proc/meminfo
- cpuinfo: List[Dict] from /proc/cpuinfo
- diskstats: List[Dict] from /proc/diskstats
- netdev: List[Dict] from /proc/net/dev
- version: str from /proc/version
- loadavg: Tuple[float, float, float] from /proc/loadavg
- uptime: float from /proc/uptime
- os_release: Dict from /etc/os-release
"""class MPIClusterCollector:
"""
Collects system information from all nodes in a cluster using MPI.
This class generates a Python script that will be executed via MPI
on all nodes to collect and aggregate system information.
"""
def __init__(self, hosts: List[str], mpi_bin: str, logger,
allow_run_as_root: bool = False,
timeout_seconds: int = 60):
self.hosts = hosts
self.mpi_bin = mpi_bin
self.logger = logger
self.allow_run_as_root = allow_run_as_root
self.timeout = timeout_seconds
def collect(self) -> Dict[str, Any]:
"""
Execute MPI collection across all nodes.
Returns:
Dictionary mapping hostname -> system_info dict
"""
def _generate_collector_script(self, output_path: str) -> str:
"""Generate the MPI collector Python script"""
def _parse_collection_results(self, output_file: str) -> Dict[str, Any]:
"""Parse the JSON output from the MPI collection"""The collector will generate and execute a Python script like:
#!/usr/bin/env python3
"""MPI System Information Collector - Generated by MLPerf Storage"""
import json
import socket
import sys
def collect_local_info():
info = {"hostname": socket.gethostname()}
# Read /proc/meminfo
try:
with open("/proc/meminfo", "r") as f:
info["meminfo"] = parse_meminfo(f.read())
except Exception as e:
info["meminfo_error"] = str(e)
# Read /proc/cpuinfo
try:
with open("/proc/cpuinfo", "r") as f:
info["cpuinfo"] = parse_cpuinfo(f.read())
except Exception as e:
info["cpuinfo_error"] = str(e)
# Read /proc/diskstats
try:
with open("/proc/diskstats", "r") as f:
info["diskstats"] = parse_diskstats(f.read())
except Exception as e:
info["diskstats_error"] = str(e)
# Additional files...
return info
def main():
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
# Collect local info
local_info = collect_local_info()
# Gather all info to rank 0
all_info = comm.gather(local_info, root=0)
if rank == 0:
# Write combined results to output file
output = {info["hostname"]: info for info in all_info}
with open(sys.argv[1], "w") as f:
json.dump(output, f, indent=2)
if __name__ == "__main__":
main()File: mlpstorage/rules.py
@dataclass
class HostInfo:
"""Information about a single host in the system"""
hostname: str
memory: HostMemoryInfo = field(default_factory=HostMemoryInfo)
cpu: Optional[HostCPUInfo] = None
disks: Optional[List[HostDiskInfo]] = None # NEW
network: Optional[List[HostNetworkInfo]] = None # NEW
system: Optional[HostSystemInfo] = None # NEW
collection_timestamp: Optional[str] = None # NEW
@classmethod
def from_collected_data(cls, data: Dict[str, Any]) -> 'HostInfo':
"""Create HostInfo from MPI-collected data dictionary"""class ClusterInformation:
def __init__(self, host_info_list: List[HostInfo], logger,
calculate_aggregated_info=True):
# Existing attributes...
self.total_memory_bytes = 0
self.total_cores = 0
# NEW aggregated attributes
self.num_hosts = len(host_info_list)
self.min_memory_bytes = 0
self.max_memory_bytes = 0
self.collection_method = "unknown" # "mpi", "args", "dlio_summary"
self.collection_timestamp = None
self.host_consistency_issues = [] # List of detected inconsistencies
@classmethod
def from_mpi_collection(cls, collected_data: Dict[str, Any],
logger) -> 'ClusterInformation':
"""Create ClusterInformation from MPI collector output"""
def validate_cluster_consistency(self) -> List[str]:
"""
Check that all nodes have consistent configurations.
Returns list of warning messages for any inconsistencies.
"""File: mlpstorage/benchmarks/base.py and mlpstorage/benchmarks/dlio.py
class Benchmark(abc.ABC):
def __init__(self, args, logger=None, run_datetime=None, run_number=0):
# Existing init...
# NEW: Collect cluster information before benchmark
self.cluster_information = None
if self._should_collect_cluster_info():
self.cluster_information = self._collect_cluster_information()
def _should_collect_cluster_info(self) -> bool:
"""Determine if we should collect cluster info"""
return (hasattr(self.args, 'hosts') and
self.args.hosts and
len(self.args.hosts) > 0 and
self.args.command not in ['datagen', 'configview'])
def _collect_cluster_information(self) -> Optional[ClusterInformation]:
"""
Collect cluster information using MPI if available,
otherwise fall back to CLI args.
"""
if self.args.exec_type == EXEC_TYPE.MPI:
try:
from mlpstorage.cluster_collector import MPIClusterCollector
collector = MPIClusterCollector(
hosts=self.args.hosts,
mpi_bin=self.args.mpi_bin,
logger=self.logger,
allow_run_as_root=self.args.allow_run_as_root
)
return collector.collect()
except Exception as e:
self.logger.warning(f"MPI collection failed: {e}, falling back to args")
# Fallback to existing behavior
return self._collect_cluster_info_from_args()
def _collect_cluster_info_from_args(self) -> ClusterInformation:
"""Collect cluster info from CLI arguments (existing behavior)"""class DLIOBenchmark(Benchmark):
def accumulate_host_info(self, args):
"""
UPDATED: Use MPI-collected data if available,
otherwise fall back to CLI args.
"""
# If we already have collected cluster info, use it
if hasattr(self, 'cluster_information') and self.cluster_information:
return self.cluster_information
# Existing fallback behavior...File: mlpstorage/utils.py
class MLPSJsonEncoder(json.JSONEncoder):
def default(self, obj):
# Add handling for new dataclasses
if isinstance(obj, (HostDiskInfo, HostNetworkInfo, HostSystemInfo)):
return asdict(obj)
# Existing handling...Consider writing detailed cluster info to a separate file:
def write_cluster_info(self):
"""Write detailed cluster information to cluster_info.json"""
cluster_info_path = os.path.join(
self.run_result_output,
f"{self.BENCHMARK_TYPE.value}_cluster_info.json"
)
with open(cluster_info_path, 'w') as f:
json.dump(self.cluster_information.to_detailed_dict(), f, indent=2)File: mlpstorage/rules.py
class BenchmarkResult:
def __init__(self, benchmark_result_root_dir, logger):
# Existing init...
self.cluster_info = None # NEW
def _process_result_directory(self):
# Existing processing...
# NEW: Load cluster info from dedicated file or metadata
cluster_info_file = os.path.join(
self.benchmark_result_root_dir,
"*_cluster_info.json"
)
cluster_info_files = glob.glob(cluster_info_file)
if cluster_info_files:
with open(cluster_info_files[0], 'r') as f:
self.cluster_info = json.load(f)
elif self.metadata and 'cluster_information' in self.metadata:
self.cluster_info = self.metadata['cluster_information']class BenchmarkRun:
def __init__(self, benchmark_result=None, benchmark_instance=None, logger=None):
# Existing init...
if benchmark_result:
# Load from files (post-execution)
self._load_from_result(benchmark_result)
elif benchmark_instance:
# Use live instance (pre-execution)
self._load_from_instance(benchmark_instance)
def _load_from_result(self, benchmark_result):
"""Load from BenchmarkResult (post-execution verification)"""
# Use dedicated cluster_info if available
if benchmark_result.cluster_info:
self.system_info = ClusterInformation.from_dict(
benchmark_result.cluster_info,
self.logger
)File: mlpstorage/rules.py
class ClusterValidationRulesChecker(RulesChecker):
"""Validates cluster configuration before benchmark execution"""
def __init__(self, benchmark_run, logger):
super().__init__(logger)
self.benchmark_run = benchmark_run
def check_cluster_consistency(self):
"""Verify all nodes have consistent configurations"""
issues = self.benchmark_run.system_info.validate_cluster_consistency()
for issue in issues:
self.issues.append(Issue(
validation=PARAM_VALIDATION.OPEN,
message=issue,
severity="warning"
))
def check_minimum_memory(self):
"""Verify minimum memory requirements are met"""
min_memory_gb = self.benchmark_run.system_info.min_memory_bytes / (1024**3)
if min_memory_gb < MINIMUM_HOST_MEMORY_GB:
self.issues.append(Issue(
validation=PARAM_VALIDATION.INVALID,
message=f"Host memory {min_memory_gb}GB below minimum {MINIMUM_HOST_MEMORY_GB}GB",
parameter="host_memory",
expected=f">= {MINIMUM_HOST_MEMORY_GB}GB",
actual=f"{min_memory_gb}GB"
))
def check_mpi_collection_success(self):
"""Verify MPI collection succeeded on all nodes"""
if self.benchmark_run.system_info.collection_method != "mpi":
self.issues.append(Issue(
validation=PARAM_VALIDATION.OPEN,
message="Cluster info collected via CLI args, not MPI",
severity="info"
))class BenchmarkVerifier:
def verify(self) -> PARAM_VALIDATION:
# Existing verification...
# NEW: Add cluster validation for multi-host runs
if len(self.benchmark_run.system_info.host_info_list) > 1:
cluster_checker = ClusterValidationRulesChecker(
self.benchmark_run,
self.logger
)
cluster_issues = cluster_checker.run_checks()
self.issues.extend(cluster_issues)File: mlpstorage/cli.py
def add_cluster_collection_arguments(parser):
"""Add arguments for cluster information collection"""
cluster_group = parser.add_argument_group("Cluster Information Collection")
cluster_group.add_argument(
"--collect-cluster-info",
action="store_true",
default=True,
help="Collect detailed system information from all hosts via MPI"
)
cluster_group.add_argument(
"--skip-cluster-collection",
action="store_true",
default=False,
help="Skip MPI-based cluster collection, use CLI args only"
)
cluster_group.add_argument(
"--cluster-collection-timeout",
type=int,
default=60,
help="Timeout in seconds for MPI cluster collection"
)mlpstorage/
├── cluster_collector.py # NEW: MPI-based cluster collection
├── rules.py # MODIFIED: Extended data classes, new validators
├── benchmarks/
│ ├── base.py # MODIFIED: Integration with collection
│ └── dlio.py # MODIFIED: Use MPI-collected data
├── utils.py # MODIFIED: JSON encoder updates
├── cli.py # MODIFIED: New CLI arguments
└── main.py # MODIFIED: Collection orchestration
┌─────────────────────────────────────────────────────────────────────┐
│ Pre-Execution │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ CLI Args (hosts) ──────────────────────┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ MPI Collector │◄───fallback────│ Args-based │ │
│ │ (preferred) │ │ Collection │ │
│ └──────────────┘ └──────────────┘ │
│ │ │ │
│ └─────────────┬───────────────────┘ │
│ ▼ │
│ ┌───────────────────┐ │
│ │ ClusterInformation│ │
│ └───────────────────┘ │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Rules │ │ Benchmark│ │ Metadata │ │
│ │ Checking │ │ Instance │ │ Storage │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Post-Execution │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Results Directory │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ BenchmarkResult │◄─── Loads metadata.json + cluster_info.json │
│ └──────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ BenchmarkRun │◄─── Reconstructs ClusterInformation │
│ └──────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ BenchmarkVerifier│◄─── Runs rules with full system info │
│ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
- Phase 1.2: Implement
/procfile parsers (standalone, testable) - Phase 1.1: Implement new data classes (
HostDiskInfo, etc.) - Phase 2: Extend existing data classes in
rules.py - Phase 1.3-1.5: Implement
MPIClusterCollector - Phase 4: Update JSON encoder and metadata storage
- Phase 3: Integrate with Benchmark classes
- Phase 5: Update
BenchmarkResultandBenchmarkRun - Phase 6: Add cluster validation rules
- Phase 7: Add CLI arguments
- Testing: Unit tests for parsers, integration tests for MPI collection
- Parser functions for each
/procfile format - Data class serialization/deserialization
- ClusterInformation aggregation and validation
- MPI collection on single node (mpirun -n 1)
- MPI collection on multiple hosts (requires test cluster)
- Fallback behavior when MPI fails
- End-to-end metadata storage and retrieval
mlpstorage/tests/
├── test_cluster_collector.py # NEW
├── test_proc_parsers.py # NEW
└── test_rules.py # MODIFIED
- MPI Not Available: Fall back to CLI args-based collection
- Partial Node Failure: Log warning, continue with available data
- Timeout: Log error, fall back to CLI args
- Permission Denied: Handle gracefully, skip unavailable files
- Inconsistent Data: Flag as warning, allow run to continue
- Existing CLI args (
--client-host-memory-in-gb) remain functional - MPI collection is the default when hosts are specified
--skip-cluster-collectionflag allows users to use old behavior- Metadata format remains compatible (additional fields are optional)
- Existing
BenchmarkResultloading works with or without new fields