DATA add ill-conditionned  simulated data

As discussed in this [comment from sklearn](https://github.com/scikit-learn/scikit-learn/pull/15583#issuecomment-553964422),  when the features of the dataset are not scaled, there can be slow convergence of optimization methods.

Adding an example which such ill-conditioned matrix would be very interesting.
The data generation mechanism is (quick extract, check this before coding :) ):

```python
from sklearn.datasets import make_low_rank_matrix

n_samples, n_features = 1000, 10000

w_true = rng.randn(n_features)

X = make_low_rank_matrix(n_samples, n_features, random_state=rng)
X[:, 0] *= 1e3
X[:, -1] *= 1e3

z = X @ w_true + 1
z += 1e-1 * rng.randn(n_samples)

# Balanced binary classification problem
y = (z > np.median(z)).astype(np.int32)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DATA add ill-conditionned simulated data #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DATA add ill-conditionned simulated data #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions