Skip to content

distributed.py - definitely needs update from stopevent; having headaches coding #101

@drallensmith

Description

@drallensmith

@bennr01: Hi. I was wanting to check to make sure that distributed evaluation worked with checkpointing (classes that aren't built at the top level of a file can be problematic for that, such as the _EvaluatorSyncManager, being one reason for my concern), so put together a test_xor_example_distributed.py test case, which you can see in my fork's config_work branch. However, with no alterations to distributed.py, it usually had a timeout error while waiting for the stopevent to connect. The one time it didn't, it thought the DistributedEvaluator was still running (since de.started was not reset to False by de.stop()), and so errored out anyway. The fix for the latter error is easy; the former, assuming it's a problem with using an event instead of a Value, is another matter.

I've been trying to make the changeover to stopevent but am having some major headaches getting a proxy value distributed between multiple processes that may be on different machines (my initial try, putting a Value from syncmanager into the queue as the first entry, would have worked... except that other secondary processes didn't get the Value! Oops....). Any suggestions? Using namespace isn't working, which was my most recent try, replacing Value.

I also changed the "master/slave" terminology to "primary/secondary", which is more modern usage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions