-
Notifications
You must be signed in to change notification settings - Fork 335
[MLH Phase 2] Refactor Logging System #284
base: main
Are you sure you want to change the base?
[MLH Phase 2] Refactor Logging System #284
Conversation
979419f
to
bdf5fd1
Compare
bdf5fd1
to
e694347
Compare
e694347
to
fef448c
Compare
The TensorboardWriter can also log rich images and histograms
Co-authored-by: Grace Omotoso <[email protected]>
Implement TensorboardWriter
if key == "ep" | ||
else f"{stdout_data}{key}: {value}; " | ||
evt_stg.put_scalars( | ||
rolling_btime=rolling_btime, rolling_eta=rolling_eta_secs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I prefer keeping all times in human readable string. We may want to add a VisslEventStorage#put_string or change #put_scalars to #put_values and make it type agnostic.
# ----------------------------------------------------------------------------------- # | ||
TENSORBOARD_SETUP: | ||
# whether to use tensorboard for the visualization | ||
VISUALIZATION_SAMPLE_PERIOD: -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment what this is.
nit: move comment below USE_TENSORBOARD
|
||
|
||
def default_hook_generator(cfg: AttrDict) -> List[ClassyHook]: | ||
def default_hook_generator(cfg: AttrDict, event_storage) -> List[ClassyHook]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Type hint for event_storage.
sample = next(task.data_iterator) | ||
|
||
sample = construct_sample_for_model(sample, task) | ||
vis_period = task.config.HOOKS.TENSORBOARD_SETUP.VISUALIZATION_SAMPLE_PERIOD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we comment this code block?
storage = task.event_storage | ||
storage.put_images() | ||
name = f"Model input sample: iteration: {task.iteration_num}" | ||
for idx, vis_img in enumerate(sample["input"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: If batch size is large, this will upload a lot of images -- do we want to offer the option to only sample a small portion of the inputs?
several backends (json, tensorboard, etc) | ||
""" | ||
|
||
def __init__(self, start_iter=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add logic that ensures that we set the start_iter correctly upon loading a checkpoint, I don't see this logic anywherere?
"tensorboard not installed, cannot use SSLTensorboardHook" | ||
) | ||
logging.info("Setting up SSL Tensorboard Hook...") | ||
self.tb_writer = tb_writer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need tb_writer anymore in this class? Can we refactor ActivationStatisticsMonitor
to use event_storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirming: none of these methods write to tensorboard anymore and that it's only handled in TensorBoardWriter#Write.
evt_stg.put_scalars( | ||
rolling_btime=rolling_btime, rolling_eta=rolling_eta_secs | ||
) | ||
logging.info(stdout_data.strip()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I believe we lose the logging to stdout here, which imo we should definitely keep -- super helpful for debugging.
We could create another EventStorageWriter called StdOutWriter and add to event_storage_writers.
] | ||
|
||
@property | ||
def event_storage(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just use self.event_storage internally and externally.
Add a scalar `value` to the `HistoryBuffer` associated with `name`. | ||
""" | ||
history = self._history[name] | ||
value = float(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Do we need this?
@iseessel I see some of the comments are unaddressed, do we need to work on these? Lmk if there is anything I could do! |
Summary: Pull Request resolved: fairinternal/ssl_scaling#284 Reviewed By: odelalleau Differential Revision: D42220017 Pulled By: QuentinDuval fbshipit-source-id: 742419aa859fdbe4bc80f1f9e9f4771fee0f41a2
The new logging system will be based around an event storage for each
SelfSupervisionTask
and events will be logged to their corresponding output device byVisslEventWriter
s like theJsonWriter
andTensorboardWriter