You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+25-1Lines changed: 25 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,14 +28,38 @@ To make this approach work, we need a model that maps context information about
28
28
29
29
1. A grammar for *L*
30
30
2. A set of input files written in *L*
31
-
3. A file written in *L* but not in the corpus that you would like to format
31
+
32
+
It can then format a previously-unseen file written in *L* that you would like to format.
32
33
33
34
`CodeBuff` trains a *k-Nearest-Neighbor* (*kNN*) machine learning model based upon the corpus. The *kNN* model is particularly attractive because it is very powerful yet simple and mirrors how programmers format code. Programmers scan their memory for similar context situations and apply the same rule or the rule they do most often.
34
35
36
+
The model is limited according to the amount of context provided by the features (independent variables). It is also limited to alignments or indentation that are relative to a previous token in the token stream.
37
+
35
38
## Mechanism
36
39
40
+
The prediction categories and features used by the kNN model are critical to the success of `CodeBuff`.
41
+
42
+
### Prediction categories
43
+
44
+
For a given token and parse tree context (relative to current token), we predict one of:
45
+
46
+
1. inject *n>=0* newlines
47
+
2. inject *n>=0* spaces
48
+
3. align with child at *index>=0* of ancestor *delta>=0* steps up parse-tree parent chain
49
+
4. indent from first token of ancestor *delta>=0* steps up parse-tree parent chain (the number of spaces per indent is a parameter to the formatter)
50
+
5. indent from start of previous line
51
+
52
+
For efficiency, we use just two classifiers, one for predicting injection of newlines/spaces and one for predicting alignment/indentation. The result of prediction is a tuple:
0 commit comments