I was wondering why did you use BFGS optimization instead of inbuilt ADAM/Gradient descent optimization method in pytorch?
I was wondering why did you use BFGS optimization instead of inbuilt ADAM/Gradient descent
optimization method in pytorch?