Skip to content

Set the knowledge about Launcher and Worker to CRD #519

@tenzen-y

Description

@tenzen-y

/kind feature

We have API violations related to maps in the following:

// MPIReplicaSpecs contains maps from `MPIReplicaType` to `ReplicaSpec` that
// specify the MPI replicas to run.
MPIReplicaSpecs map[MPIReplicaType]*common.ReplicaSpec `json:"mpiReplicaSpecs"`

// replicaStatuses is map of ReplicaType and ReplicaStatus,
// specifies the status of each replica.
// +optional
ReplicaStatuses map[MPIReplicaType]*ReplicaStatus `json:"replicaStatuses,omitempty"`

So we should use named subobjects instead of maps in the following:

type MPIReplicaSpecs struct {
  Launcher common.ReplicaSpec `json:"launcher"`
  Worker common.ReplicaSpec     `json:"worker"`
}
type MPIReplicaStatuses struct {
  Launcher *ReplicaStatus `json:"launcher"`
  Worker *ReplicaStatus     `json:"worker"`
}

However, we should work on this in the new API version such as v2beta2 or v2 since this change doesn't have backward compatibility.

Original comments by @alculquicondor.

We are losing the knowledge about Launcher and Worker here.
But this is a problem with the API structs themselves.

We shouldn't be using a map https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#lists-of-named-subobjects-preferred-over-maps

Maybe we can fix it in a v2(beta2) API, but it would be nice to fix other training job objects, like so:

type MPIReplicaSpecs struct {
  Launcher common.ReplicaSpec `json:"launcher"`
  Worker common.ReplicaSpec `json:"worker"`
 }

#510 (comment)

another violation of API conventions T_T
But we can only change it in a new API version.

#514 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions