Supporting Distributed OptiGraphs by dlcole3 · Pull Request #144 · plasmo-dev/Plasmo.jl

dlcole3 · 2025-08-04T19:31:32Z

This PR supports distributing OptiGraphs. Most of the source code is contained in the distributed folder that has been added. A set of unit tests and a two pages of documentation have also been added.

The core structures are outlined in distributed/core_types.jl and include the RemoteOptiGraph, RemoteNodeRef, RemoteVariableRef, RemoteEdgeRef, and RemoteOptiEdge. Passing any of these objects directly to the remote worker also results in passing the whole RemoteOptiGraph to the remote worker, so proxy objects have also been defined that allow for only passing the minimum data to create the corresponding objects or accessing the objects on the remote.

In defining these new structures, I have tried to extend the same functions that have been extended for Plasmo. One exception is the relax_integrality function which could perhaps still be included in this PR.

…aints

src/distributed/distribute_optigraph.jl

src/distributed/jump_interop.jl

src/distributed/optinode.jl

src/macros.jl

jalving

This is PR very is very comprehensive. I only commented on minor things. For the sake of getting a working distributed version released, I think it is okay to release soon since base Plasmo code is hardly touched.

In the future, it would be great to find ways to cut down on duplicate implementations of the same Plasmo/JuMP interface, but this might be unavoidable given how JuMP extensions work.

jalving · 2025-08-18T04:14:42Z

I get this error from distributed when i run tests locally:

Unhandled Task ERROR: IOError: stream is closed or unusable
Stacktrace:
  [1] check_open
    @ ./stream.jl:388 [inlined]
  [2] uv_write_async(s::Base.PipeEndpoint, p::Ptr{UInt8}, n::UInt64)
    @ Base ./stream.jl:1108
  [3] uv_write(s::Base.PipeEndpoint, p::Ptr{UInt8}, n::UInt64)
    @ Base ./stream.jl:1069
  [4] unsafe_write(s::Base.PipeEndpoint, p::Ptr{UInt8}, n::UInt64)
    @ Base ./stream.jl:1154
  [5] write
    @ ./strings/io.jl:248 [inlined]
  [6] print
    @ ./strings/io.jl:250 [inlined]
  [7] print(::Base.PipeEndpoint, ::String, ::String)
    @ Base ./strings/io.jl:46
  [8] println(::Base.PipeEndpoint, ::String, ::Vararg{Any})
    @ Base ./strings/io.jl:75
  [9] println(xs::String)
    @ Base ./coreio.jl:4
 [10] (::Distributed.var"#37#38"{Int64, Base.PipeEndpoint})()
    @ Distributed ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Distributed/src/cluster.jl:289

But tests still pass.

I think using @suppress causes issues with distributed with how stdout gets forwarded. I think it should be simplest to just use JuMP.set_silent(graph). You might need to set a HiGHS optimizer attribute as well.

dlcole3 · 2025-08-18T13:45:03Z

I get this error from distributed when i run tests locally:

Unhandled Task ERROR: IOError: stream is closed or unusable
Stacktrace:
  [1] check_open
    @ ./stream.jl:388 [inlined]
  [2] uv_write_async(s::Base.PipeEndpoint, p::Ptr{UInt8}, n::UInt64)
    @ Base ./stream.jl:1108
  [3] uv_write(s::Base.PipeEndpoint, p::Ptr{UInt8}, n::UInt64)
    @ Base ./stream.jl:1069
  [4] unsafe_write(s::Base.PipeEndpoint, p::Ptr{UInt8}, n::UInt64)
    @ Base ./stream.jl:1154
  [5] write
    @ ./strings/io.jl:248 [inlined]
  [6] print
    @ ./strings/io.jl:250 [inlined]
  [7] print(::Base.PipeEndpoint, ::String, ::String)
    @ Base ./strings/io.jl:46
  [8] println(::Base.PipeEndpoint, ::String, ::Vararg{Any})
    @ Base ./strings/io.jl:75
  [9] println(xs::String)
    @ Base ./coreio.jl:4
 [10] (::Distributed.var"#37#38"{Int64, Base.PipeEndpoint})()
    @ Distributed ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Distributed/src/cluster.jl:289

But tests still pass.

I think using @suppress causes issues with distributed with how stdout gets forwarded. I think it should be simplest to just use JuMP.set_silent(graph). You might need to set a HiGHS optimizer attribute as well.

I have had this error come up a couple times for me, but it did not always seem to be replicatable. I will take a closer look at the tests and see if it is from suppress or something else. At least one or two times where this error came up for me, I reran the code and it seemed to work the second time. I'll see if I can recreate the error and figure out what is throwing it.

dlcole3 · 2025-08-20T18:05:11Z

@jalving Another thing I would be interested in your feedback on for the code review is how best to distinguish local link constraints on the remote graph. Some of this became a little more clear when I was writing the tests. There are RemoteEdgeRefs and RemoteOptiEdges for link constraints. The RemoteEdgeRefs are references to edges that exist on the OptiGraph on the remote worker, while RemoteOptiEdges exist on the main worker and link across 2 or more RemoteOptiGraph objects. I chose to make separate objects for these so that it is easy to distinguish between them. For instance, PlasmoBenders will only use the RemoteOptiEdges because they link the subproblems.
A challenge that comes from this is what functions should call. For instance, num_constraints(::RemoteOptiGraph) returns the number of constraints stored on remote workers and does not count the number of constraints stored on the RemoteOptiEdges (but maybe should). Happy to rewrite some of these functions based on your suggestions. I also have all_link_constraints which returns constraints on RemoteOptiEdges and all_remote_link_constraints which returns the constraints stored on RemoteEdgeRefs (that is, the latter returns the link constraints on the remote worker). This may not be the best way of doing it, but regardless of what we do, I think I should add a page to the documentation that explains this more clearly before we merge the PR.

I think differentiating between types of edges as you did is necessary for the distributed implementation. Plasmo already takes a centralized approach to handling subgraphs (parent graphs own linking information), so denoting RemoteOptiEdge for links across workers is consistent with the abstraction. I see the RemoteOptiGraph that you implemented as an effective way to refer to an optigraph on a worker and contain the extra information about the process connectivity. The only confusion for me is whether RemoteOptiEdge is the name we want to go with since technically the edge itself is not 'remote'; it connects remote objects.

For querying the remote graph, it is my opinion to have num_constraints(graph::RemoteOptiGraph) return all the constraints (including cross process constraints) to be consistent with the optigraph interface. We can still have other specific methods for differentiating between the two types of constraints.

@jalving I think I am going to go with InterWorkerEdges unless you have a better idea for the name.

dlcole3 · 2025-08-20T18:06:58Z

I get this error from distributed when i run tests locally:

Unhandled Task ERROR: IOError: stream is closed or unusable
Stacktrace:
  [1] check_open
    @ ./stream.jl:388 [inlined]
  [2] uv_write_async(s::Base.PipeEndpoint, p::Ptr{UInt8}, n::UInt64)
    @ Base ./stream.jl:1108
  [3] uv_write(s::Base.PipeEndpoint, p::Ptr{UInt8}, n::UInt64)
    @ Base ./stream.jl:1069
  [4] unsafe_write(s::Base.PipeEndpoint, p::Ptr{UInt8}, n::UInt64)
    @ Base ./stream.jl:1154
  [5] write
    @ ./strings/io.jl:248 [inlined]
  [6] print
    @ ./strings/io.jl:250 [inlined]
  [7] print(::Base.PipeEndpoint, ::String, ::String)
    @ Base ./strings/io.jl:46
  [8] println(::Base.PipeEndpoint, ::String, ::Vararg{Any})
    @ Base ./strings/io.jl:75
  [9] println(xs::String)
    @ Base ./coreio.jl:4
 [10] (::Distributed.var"#37#38"{Int64, Base.PipeEndpoint})()
    @ Distributed ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Distributed/src/cluster.jl:289

But tests still pass.

I think using @suppress causes issues with distributed with how stdout gets forwarded. I think it should be simplest to just use JuMP.set_silent(graph). You might need to set a HiGHS optimizer attribute as well.

I updated the script to skip @suppress on the remote graph optimize calls, and I extended JuMP.set_silent (this was missed before). I am not able to replicate this error, but the tests all pass on my machine (with the exception fo the KahyPar tests, but they pass on Github). Let me know if you have any issues with the tests now.

jalving · 2025-08-23T03:17:46Z

I updated the script to skip @Suppress on the remote graph optimize calls, and I extended JuMP.set_silent (this was missed before). I am not able to replicate this error, but the tests all pass on my machine (with the exception fo the KahyPar tests, but they pass on Github). Let me know if you have any issues with the tests now.

This fixed it 👍

dlcole3 · 2025-08-24T00:18:31Z

@jalving I renamed RemoteOptiEdge to InterWorkerEdge. The OptiGraph object has a field called optiedges, and I followed a similar approach for the RemoteOptiGraph, where it has a field optiedges, but since we are calling these InterWorkerEdges, do you think we should still call this field optiedges? It is more consistent with the OptiGraph object, and they are effectively optiedges, but I was uncertain if this could cause some confusion or not.

jalving · 2025-08-24T02:53:02Z

@jalving I renamed RemoteOptiEdge to InterWorkerEdge. The OptiGraph object has a field called optiedges, and I followed a similar approach for the RemoteOptiGraph, where it has a field optiedges, but since we are calling these InterWorkerEdges, do you think we should still call this field optiedges? It is more consistent with the OptiGraph object, and they are effectively optiedges, but I was uncertain if this could cause some confusion or not.

Keeping the attribute name as optiedges is fine. They are already typed as InterWorkerEdges. Users are supposed to use the getter functions anyways 🙂

jalving · 2025-08-24T02:53:42Z

@dlcole3 feel free to merge when ready

dlcole3 and others added 30 commits May 21, 2025 16:51

initial distributed graphs

7e68b2b

Initial struct definitions and practice

b80ebe8

variable and optinode macros

9614782

Updated example

e323c10

Worked on constraint macro

5672bca

successful optimization

2618e73

Added methods

f8fa749

additional functionality

95fa923

added variable details

1c11f7c

more methods, refactoring

55287c0

Extending more methods

816f425

Additional methods

2ab70f1

updated macro

94620e4

attempt at better macro

df5b931

Resolved RemoteNodeRef equality issue

109057f

Fixed node_obj_dict setting

c7e708e

updated indexing

2259341

Added doc strings

4d64592

Added object dict for RemoteOptiGraph

f9395bc

Debugged equality issue

e97dbdf

name support for constraints and expressions

9e44876

Supported some constraint functions and naming expressions and constr…

99507f9

…aints

Added dual_objective function

aee6952

Updating to use proxy variables

92416ff

debugging proxy options

40e21ee

fixed serializing option

7ea7db6

Fixed possible serialization issue in node constraints

729bfcb

Fixed deserialization issue in constraints

61cccc4

Initial support of Dense and Sparse Axis Arrays

14c075a

Updated proxy conversions

0a139bb