-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
As discussed in this comment from sklearn, when the features of the dataset are not scaled, there can be slow convergence of optimization methods.
Adding an example which such ill-conditioned matrix would be very interesting.
The data generation mechanism is (quick extract, check this before coding :) ):
from sklearn.datasets import make_low_rank_matrix
n_samples, n_features = 1000, 10000
w_true = rng.randn(n_features)
X = make_low_rank_matrix(n_samples, n_features, random_state=rng)
X[:, 0] *= 1e3
X[:, -1] *= 1e3
z = X @ w_true + 1
z += 1e-1 * rng.randn(n_samples)
# Balanced binary classification problem
y = (z > np.median(z)).astype(np.int32)Metadata
Metadata
Assignees
Labels
No labels