Skip to content

Commit 7b5a169

Browse files
Fix issues with dataset API doc (#163)
* Fix formatting error in dataset API doc by removing spaces * Added links to other dataset descriptions. * Added DirectionalConvexHull to selection docs. * Added DCH to scikit-matter intro page. * set nbsphinx version to 0.8.12 because 0.9 fails --------- Co-authored-by: alexgo <[email protected]>
1 parent f75c51a commit 7b5a169

File tree

9 files changed

+94
-90
lines changed

9 files changed

+94
-90
lines changed

docs/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
ipykernel
22
matplotlib
3-
nbsphinx
3+
nbsphinx==0.8.12
44
nbconvert
55
numpy
66
scikit-learn >=0.24.0

docs/source/datasets.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,11 @@
11
Datasets
22
========
33

4+
.. include:: ../../skmatter/datasets/descr/csd-1000r.rst
5+
46
.. include:: ../../skmatter/datasets/descr/degenerate_CH4_manifold.rst
57

6-
.. include:: ../../skmatter/datasets/descr/csd-1000r.rst
8+
.. include:: ../../skmatter/datasets/descr/nice_dataset.rst
9+
10+
.. include:: ../../skmatter/datasets/descr/who_dataset.rst
711

docs/source/intro.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ of computational materials science and chemistry.
2323
* :ref:`FPS-api`: a common selection technique intended to exploit the diversity of the input space. The selection of the first point is made at random or by a separate metric.
2424
* :ref:`PCov-FPS-api` extends upon FPS much like PCov-CUR does to CUR.
2525
* :ref:`Voronoi-FPS-api`: conduct FPS selection, taking advantage of Voronoi tessellations to accelerate selection.
26+
* :ref:`DCH-api`: selects samples by constructing a directional convex hull and determining which samples lie on the bounding surface.
2627

2728
- Reconstruction Measures:
2829
A set of easily-interpretable error measures of the relative information capacity of feature space `F` with respect to feature space `F'`.

docs/source/selection.rst

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -292,3 +292,29 @@ spot for Voronoi FPS is when the number of selectable samples is already enough
292292
to divide the space with Voronoi polyhedrons, but not yet comparable to the total
293293
number of samples, when the cost of bookkeeping significantly degrades the speed
294294
of work compared to FPS.
295+
296+
.. _DCH-api:
297+
298+
Directional Convex Hull (DCH)
299+
#############################
300+
.. currentmodule:: skmatter.sample_selection._base
301+
302+
.. autoclass :: DirectionalConvexHull
303+
304+
This selector can be instantiated using `skmatter.sample_selection.DirectionalConvexHull`.
305+
306+
.. code-block:: python
307+
308+
from skmatter.sample_selection import DirectionalConvexHull
309+
selector = DirectionalConvexHull(
310+
# Indices of columns of X to use for fitting
311+
# the convex hull
312+
low_dim_idx=[0,1],
313+
)
314+
selector.fit(X,y)
315+
316+
# Get the distance to the convex hull for samples used to fit the
317+
# convex hull. This can also be called using other samples (X_new)
318+
# and corresponding properties (y_new) that were not used to fit
319+
# the hull.
320+
Xr = selector.score_samples(X,y)

skmatter/datasets/descr/csd-1000r.rst

Lines changed: 54 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,61 @@ Data Set Characteristics
4646
References
4747
----------
4848

49-
.. [C1] https://github.com/lab-cosmo/librascal commit ade202a6
50-
.. [C2] https://github.com/lab-cosmo/scikit-matter commit 4ed1d92
49+
.. [C1] https://github.com/lab-cosmo/librascal commit ade202a6
50+
.. [C2] https://github.com/lab-cosmo/scikit-matter commit 4ed1d92
5151
5252
Reference Code
5353
--------------
5454

55-
.. literalinclude:: /../../skmatter/datasets/make_csd_1000r.py
55+
.. code-block:: python
56+
57+
from skmatter.feature_selection import CUR
58+
from skmatter.preprocessing import StandardFlexibleScaler
59+
from skmatter.sample_selection import FPS
60+
61+
# read all of the frames and book-keep the centers and species
62+
filename = "/path/to/CSD-1000R.xyz"
63+
frames = np.asarray(
64+
read(filename, ":"),
65+
dtype=object,
66+
)
67+
68+
n_centers = np.array([len(frame) for frame in frames])
69+
center_idx = np.array([i for i, f in enumerate(frames) for p in f])
70+
n_env_accum = np.zeros(len(frames) + 1, dtype=int)
71+
n_env_accum[1:] = np.cumsum(n_centers)
72+
73+
numbers = np.concatenate([frame.numbers for frame in frames])
74+
75+
# compute radial soap vectors as first pass
76+
hypers = dict(
77+
soap_type="PowerSpectrum",
78+
interaction_cutoff=2.5,
79+
max_radial=6,
80+
max_angular=0,
81+
gaussian_sigma_type="Constant",
82+
gaussian_sigma_constant=0.4,
83+
cutoff_smooth_width=0.5,
84+
normalize=False,
85+
global_species=[1, 6, 7, 8],
86+
expansion_by_species_method="user defined",
87+
)
88+
soap = SOAP(**hypers)
89+
90+
X_raw = StandardFlexibleScaler(column_wise=False).fit_transform(
91+
soap.transform(frames).get_features(soap)
92+
)
93+
94+
# rank the environments in terms of diversity
95+
n_samples = 500
96+
i_selected = FPS(n_to_select=n_samples, initialize=0).fit(X_raw).selected_idx_
97+
98+
# book-keep which frames these samples belong in
99+
f_selected = center_idx[i_selected]
100+
reduced_f_selected = list(sorted(set(f_selected)))
101+
frames_selected = frames[f_selected].copy()
102+
ci_selected = i_selected - n_env_accum[f_selected]
103+
104+
properties_select = [
105+
frames[fi].arrays["CS_local"][ci] for fi, ci in zip(f_selected, ci_selected)
106+
]

skmatter/datasets/descr/degenerate_CH4_manifold.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,5 +49,7 @@ The SOAP bispectrum features were in addition reduced to 12 features with princi
4949
References
5050
----------
5151

52-
.. [D1] https://github.com/lab-cosmo/librascal commit 8d9ad7a
53-
.. [D2] https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
52+
.. [D1] https://github.com/lab-cosmo/librascal commit 8d9ad7a
53+
.. [D2] https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
54+
55+
=======

skmatter/datasets/descr/nice_dataset.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _nice-dataset:
22

33
NICE dataset
4-
##########
4+
############
55
This is a toy dataset containing NICE[1, 4](N-body Iterative Contraction of Equivariants) features for first 500 configurations of the dataset[2, 3] with randomly displaced methane configurations.
66

77
Function Call

skmatter/datasets/descr/who_dataset.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _who:
22

3-
who_data
4-
#########
3+
WHO dataset
4+
###########
55

66
`who_dataset.csv` is a compilation of multiple publically-available datasets
77
through data.worldbank.org. Specifically, the following versioned datasets are used:

skmatter/datasets/make_csd_1000r.py

Lines changed: 0 additions & 80 deletions
This file was deleted.

0 commit comments

Comments
 (0)