synced BatchNorm, DataModules and final API
Overview
The newest PyTorch Lightning release includes final API clean-up with better data decoupling and shorter logging syntax.
Were happy to release PyTorch Lightning 0.9 today, which contains many great new features, more bugfixes than any release we ever had, but most importantly it introduced our mostly final API changes! Lightning is being adopted by top researchers and AI labs around the world, and we are working hard to make sure we provide a smooth experience and support for all the latest best practices.
Detail changes
Added
- Added SyncBN for DDP (#2801, #2838)
- Added basic
CSVLogger(#2721) - Added SSIM metrics (#2671)
- Added BLEU metrics (#2535)
- Added support to export a model to ONNX format (#2596)
- Added support for
Trainer(num_sanity_val_steps=-1)to check all validation data before training (#2246) - Added struct. output:
- Added class
LightningDataModule(#2668) - Added support for PyTorch 1.6 (#2745)
- Added call DataModule hooks implicitly in trainer (#2755)
- Added support for Mean in DDP Sync (#2568)
- Added remaining
sklearnmetrics:AveragePrecision,BalancedAccuracy,CohenKappaScore,DCG,Hamming,Hinge,Jaccard,MeanAbsoluteError,MeanSquaredError,MeanSquaredLogError,MedianAbsoluteError,R2Score,MeanPoissonDeviance,MeanGammaDeviance,MeanTweedieDeviance,ExplainedVariance(#2562) - Added support for
limit_{mode}_batches (int)to work with infinite dataloader (IterableDataset) (#2840) - Added support returning python scalars in DP (#1935)
- Added support to Tensorboard logger for OmegaConf
hparams(#2846) - Added tracking of basic states in
Trainer(#2541) - Tracks all outputs including TBPTT and multiple optimizers (#2890)
- Added GPU Usage Logger (#2932)
- Added
strict=Falseforload_from_checkpoint(#2819) - Added saving test predictions on multiple GPUs (#2926)
- Auto log the computational graph for loggers that support this (#3003)
- Added warning when changing monitor and using results obj (#3014)
- Added a hook
transfer_batch_to_deviceto theLightningDataModule(#3038)
Changed
- Truncated long version numbers in progress bar (#2594)
- Enabling val/test loop disabling (#2692)
- Refactored into
acceleratormodule: - Using
.comet.configfile forCometLogger(#1913) - Updated hooks arguments - breaking for
setupandteardown(#2850) - Using
gfileto support remote directories (#2164) - Moved optimizer creation after device placement for DDP backends (#2904](https://github.com/PyTorchLightning/pytorch-lighting/pull/2904))
- Support
**DictConfigforhparamserialization (#2519) - Removed callback metrics from test results obj (#2994)
- Re-enabled naming metrics in ckpt name (#3060)
- Changed progress bar epoch counting to start from 0 (#3061)
Deprecated
- Deprecated Trainer attribute
ckpt_path, which will now be set byweights_save_path(#2681)
Removed
- Removed deprecated: (#2760)
- core decorator
data_loader - Module hook
on_sanity_check_startand loadingload_from_metrics - package
pytorch_lightning.logging - Trainer arguments:
show_progress_bar,num_tpu_cores,use_amp,print_nan_grads - LR Finder argument
num_accumulation_steps
- core decorator
Fixed
- Fixed
accumulate_grad_batchesfor last batch (#2853) - Fixed setup call while testing (#2624)
- Fixed local rank zero casting (#2640)
- Fixed single scalar return from training (#2587)
- Fixed Horovod backend to scale LR schedlers with the optimizer (#2626)
- Fixed
dtypeanddeviceproperties not getting updated in submodules (#2657) - Fixed
fast_dev_runto run for all dataloaders (#2581) - Fixed
save_dirin loggers getting ignored by default value ofweights_save_pathwhen user did not specifyweights_save_path(#2681) - Fixed
weights_save_pathgetting ignored whenlogger=Falseis passed to Trainer (#2681) - Fixed TPU multi-core and Float16 (#2632)
- Fixed test metrics not being logged with
LoggerCollection(#2723) - Fixed data transfer to device when using
torchtext.data.Fieldandinclude_lengths is True(#2689) - Fixed shuffle argument for the distributed sampler (#2789)
- Fixed logging interval (#2694)
- Fixed loss value in the progress bar is wrong when
accumulate_grad_batches > 1(#2738) - Fixed correct CWD for DDP sub-processes when using Hydra (#2719)
- Fixed selecting GPUs using
CUDA_VISIBLE_DEVICES(#2739, #2796) - Fixed false
num_classeswarning in metrics (#2781) - Fixed shell injection vulnerability in subprocess call (#2786)
- Fixed LR finder and
hparamscompatibility (#2821) - Fixed
ModelCheckpointnot saving the latest information whensave_last=True(#2881) - Fixed ImageNet example: learning rate scheduler, number of workers and batch size when using DDP (#2889)
- Fixed apex gradient clipping (#2829)
- Fixed save apex scaler states (#2828)
- Fixed a model loading issue with inheritance and variable positional arguments (#2911)
- Fixed passing
non_blocking=Truewhen transferring a batch object that does not support it (#2910) - Fixed checkpointing to remote file paths (#2925)
- Fixed adding
val_stepargument to metrics (#2986) - Fixed an issue that caused
Trainer.test()to stall in DDP mode (#2997) - Fixed gathering of results with tensors of varying shape (#3020)
- Fixed batch size auto-scaling feature to set the new value on the correct model attribute (#3043)
- Fixed automatic batch scaling not working with half-precision (#3045)
- Fixed setting device to root GPU (#3042)
Contributors
@ananthsub, @ananyahjha93, @awaelchli, @bkhakshoor, @Borda, @ethanwharris, @f4hy, @groadabike, @ibeltagy, @justusschock, @lezwon, @nateraw, @neighthan, @nsarang, @PhilJd, @pwwang, @rohitgr7, @romesco, @ruotianluo, @shijianjian, @SkafteNicki, @tgaddair, @thschaaf, @williamFalcon, @xmotli02, @ydcjeff, @yukw777, @zerogerc
If we forgot someone due to not matching commit email with GitHub account, let us know :]