The Petrol Station Replenishment Problem (PSRP) involves distributing fuel from depots to stations using multi-compartment vehicles over a multi-day horizon. Traditional methods are computationally intensive and focus on single-day solutions, limiting efficiency. This paper proposes a reinforcement learning-based approach to optimize routing and inventory management, reducing costs and improving operational efficiency. By leveraging learned patterns, the model adapts to dynamic constraints and eliminates the need for frequent recalculations, offering a scalable solution to PSRP.
- gym
- torch
- gymnasium
- sb3_contrib
- tensorboard
- torch_geometric
- stable-baselines3[extra]
python train.py --n 3 --veh 2 --n_steps 500 --timesteps 10000Run tensorboard --logdir ppo_tensorboard to monitor the training progress and view the prediction results.
python eval.py --n 3 --veh 2 --n_steps 500Ниже представлены примеры запуска моделей, которые находятся в data_pkl
--n [3, 4, 5, 6, 7, 8, 9, 10] - Number of nodes
--veh [2, 4, 6] - Number of vehicles
--n_steps [250, 500, 1000] - Number of steps
--timesteps - Number of training steps
--pre_train [True, False] - Loading a pre-trained model
The data is provided in a .pkl format, containing a tuple of 11 attributes. Each attribute is represented as a torch.Tensor. These data are used to train the model and define the problem parameters. Below is the data format description:
-
positions:
A 2D tensor of size(n, 2)representing the coordinates of the points.
Example:tensor([[274., 149.], [ 82., 223.], [ 94., 49.]], dtype=torch.float64)
-
weight_matrixes:
A square matrix of size(n, n)containing weights (e.g., distances between points).
Example:tensor([[ 0., 12300., 12300.], [12300., 0., 10440.], [12300., 10440., 0.]], dtype=torch.float64)
-
daily_demands:
A 3D tensor of size(d, n, 2), wheredis the number of days,nis the number of points, and2represents demand parameters.
Example:tensor([[[ 0., 0.], [10., 10.], [10., 10.]], ... [[ 0., 0.], [10., 10.], [10., 10.]]], dtype=torch.float64)
-
depots:
A 1D tensor specifying the node index of the depot.
Example:tensor([0.], dtype=torch.float64)
-
working_time:
A 1D tensor representing the working time in seconds.
Example:tensor([32400., 32400.], dtype=torch.float64)
-
restriction_matrix:
A 2D integer tensor describing restrictions or constraints.
Example:tensor([[0, 0, 0], [0, 0, 0]], dtype=torch.int32)
-
service_times:
A 1D tensor with the service time required at each point.
Example:tensor([900., 900., 900.], dtype=torch.float64)
-
min_capacities:
A 2D tensor of size(n, 2)representing the minimum capacity limits that cannot be exceeded downward.
Example:tensor([[0., 0.], [5., 5.], [5., 5.]], dtype=torch.float64)
-
max_capacities:
A 2D tensor of size(n, 2)representing the maximum capacity limits.
Example:tensor([[ 0., 0.], [95., 95.], [95., 95.]], dtype=torch.float64)
-
init_capacities:
A 2D tensor of size(n, 2)representing the initial capacities.
Example:tensor([[ 0., 0.], [50., 50.], [50., 50.]], dtype=torch.float64)
-
vehicle_compartments:
A 3D tensor of size(v, n, 2), wherevis the number of vehicles.
Example:tensor([[[50., 50.], [50., 50.], [50., 50.]], [[50., 50.], [50., 50.], [50., 50.]]], dtype=torch.float64)

