@@ -21,7 +21,7 @@ Yes, this will be a complement to ray core's ability to flexibly schedule actors
2121## Stewardship
2222### Required Reviewers
2323
24- @wumuzi520 SenlinZhu @WangTaoTheTonic @scv119 (Chen Shen) @jjyao (Jiajun Yao)
24+ @wumuzi520 SenlinZhu @Chong Li @scv119 (Chen Shen) @jjyao (Jiajun Yao)
2525### Shepherd of the Proposal (should be a senior committer)
2626
2727
@@ -335,7 +335,7 @@ ActorHandle<Counter> actor6 =
335335### Implementation plan
336336Now there are two modes of scheduling: GCS mode scheduling and raylet scheduling.
337337It will be simpler to implement in GCS mode.
338- #### GCS Scheduling Mode Implementation plan
338+ #### 1. GCS Scheduling Mode Implementation plan
339339
3403401. Actor adds the Labels property. Stored in the GcsActor structure
3413412. Gcs Server add GcsLabelManager . Add labels- > node information to GcsLabelManager after per actor completes scheduling.
@@ -397,10 +397,107 @@ Main data structure :
397397Map<label_key, Map<lable_value, Set<node_id> > > label_to_nodes_
398398Map<node_id, Set<GcsActor > > node_to_actors_
399399```
400- #### Raylet Scheduling Mode Implementation plan
400+ #### 2. Raylet Scheduling Mode Implementation plan
401401The implementation of Raylet scheduling mode is same as GCS scheduling above.
402402Mainly , one more Labels information needs to be synchronized to all Raylet nodes
403403
404+ 1. Add the actor_labels data structure to the resource synchronization data structure(ResourcesData and NodeResources ).
405+ ```
406+ message ResourcesData {
407+ // Node id.
408+ bytes node_id = 1 ;
409+ // Resource capacity currently available on this node manager.
410+ map< string, double > resources_available = 2 ;
411+ // Indicates whether available resources is changed. Only used when light
412+ // heartbeat enabled.
413+ bool resources_available_changed = 3 ;
414+
415+ // Map<key, Map<value, reference_count>> Actors scheduled to this node and actor labels information
416+ repeat Map<string, Map<string, int> > actor_labels = 15
417+ // Whether the actors of this node is changed.
418+ bool actor_labels_changed = 16 ,
419+ }
420+
421+
422+ NodeResources {
423+ ResourceRequest total;
424+ ResourceRequest available;
425+ // / Only used by light resource report.
426+ ResourceRequest load;
427+ // / Resources owned by normal tasks.
428+ ResourceRequest normal_task_resources
429+ // / Actors scheduled to this node and actor labels information
430+ absl:: flat_hash_map< string, absl:: flat_hash_map< string, int >> actor_labels;
431+ }
432+ ```
433+
434+ 2. Adapts where ResourcesData is constructed and used in the resource synchronization mechanism.
435+ a. NodeManager :: HandleRequestResourceReport
436+ b. NodeManager :: HandleUpdateResourceUsage
437+
438+
439+ 3. Add ActorLabels information to NodeResources during Actor scheduling
440+
441+ a. When the Raylet is successfully scheduled, the ActorLabels information is added to the remote node scheduled in the ClusterResoucesManager .
442+ ```
443+ void ClusterTaskManager :: ScheduleAndDispatchTasks() {
444+ auto scheduling_node_id = cluster_resource_scheduler_- > GetBestSchedulableNode(
445+ ScheduleOnNode(node_id, work);
446+ cluster_resource_scheduler_- > AllocateRemoteTaskResources(node_id, resources)
447+ cluster_resource_scheduler_- > GetClusterResourceManager(). AddActorLabels(node_id, actor);
448+ ```
449+ b. Add ActorLabels information to LocalResourcesManager when Actor is dispatched to Worker .
450+ ```
451+ LocalTaskManager :: DispatchScheduledTasksToWorkers()
452+ cluster_resource_scheduler_- > GetLocalResourceManager(). AllocateLocalTaskResources
453+ cluster_resource_scheduler_- > GetLocalResourceManager(). AddActorLabels(actor)
454+ worker_pool_. PopWorker ()
455+ ```
456+
457+ c. When the Actor is destroyed, the ActorLabels information of the LocalResourcesManager is also deleted.
458+ ```
459+ NodeManager :: HandleReturnWorker
460+ local_task_manager_- > ReleaseWorkerResources(worker);
461+ local_resource_manager_- > RemoveActorLabels(actor_id);
462+ ```
463+
464+ Actor scheduling flowchart:
465+ ! [Actor scheduling flowchart](https: // user-images.githubusercontent.com/11072802/202128385-f72609c5-308d-4210-84ff-bf3ba6df381c.png)
466+
467+ Node Resources synchronization mechanism:
468+ ! [Node Resources synchronization mechanism](https: // user-images.githubusercontent.com/11072802/202128406-b4745e6e-3565-41a2-bfe3-78843379bf09.png)
469+
470+ 4. Scheduling optimization through ActorLabels
471+ Now any node raylet has ActorLabels information for all nodes.
472+ However , when ActorAffinity schedules, if it traverses the Labels of all Actors of each node, the algorithm complexity is very large, and the performance will be poor.
473+ < b> Therefore , it is necessary to generate a full- cluster ActorLabels index table to improve scheduling performance. < b>
474+
475+ ```
476+ class GcsLabelManager {
477+ public :
478+ absl::flat_hash_set<NodeID > GetNodesByKeyAndValue (const std ::string &ray_namespace ,
479+ const std ::string &key , const absl ::flat_hash_set <std ::string > &values ) const;
480+
481+ absl::flat_hash_set<NodeID > GetNodesByKey (const std ::string &ray_namespace ,
482+ const std ::string &key ) const;
483+
484+ void AddActorLabels (const std ::shared_ptr <GcsActor > &actor );
485+
486+ void RemoveActorLabels (const std ::shared_ptr <GcsActor > &actor );
487+
488+ private :
489+ < namespace, < label_key, < lable_value, [node_id]>>> labels_to_nodes_;
490+ < node_id, < namespace, [actor]>> nodes_to_actors_;
491+ }
492+ ```
493+
494+ < b> Advantages : < b>
495+ 1. Compared with the scheme of putting Labels in the coustom resource. This scheme can also reuse the resource synchronization mechanism. Then it won' t destroy the concept of coustrom resouce.
496+
497+ <b>Defect
498+ 1. Because there must be a delay in resource synchronization under raylet scheduling. So if actor affinity is Soft semantics, there will be inaccurate scheduling.
499+
500+
404501### Failures and Special Scenarios
405502#### 1、If the Match Expression Cannot be satisfied
406503If the matching expression cannot be satisfied, The actor will be add to the pending actor queue. Util the matching expression all be statisfied。
@@ -428,3 +525,11 @@ All APIs will be fully unit tested. All specifications in this documentation wil
428525
429526## (Optional) Follow-on Work
430527
528+ ### Expression of "OR" semantics.
529+ Later, if necessary, you can extend the semantics of "OR" by adding "is_or_semantics" to ActorAffinitySchedulingStrategy.
530+ ```
531+ class ActorAffinitySchedulingStrategy:
532+ def __init__(self, match_expressions: List[ActorAffinityMatchExpression], is_or_semantics = false):
533+ self.match_expressions = match_expressions
534+ self.is_or_semantics =
535+ ```
0 commit comments