Skip to content

Commit d1be2f7

Browse files
committed
update summary
1 parent 1ab43de commit d1be2f7

File tree

1 file changed

+25
-1
lines changed

1 file changed

+25
-1
lines changed

README.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,14 +28,38 @@ To make this approach work, we need a model that maps context information about
2828

2929
1. A grammar for *L*
3030
2. A set of input files written in *L*
31-
3. A file written in *L* but not in the corpus that you would like to format
31+
32+
It can then format a previously-unseen file written in *L* that you would like to format.
3233

3334
`CodeBuff` trains a *k-Nearest-Neighbor* (*kNN*) machine learning model based upon the corpus. The *kNN* model is particularly attractive because it is very powerful yet simple and mirrors how programmers format code. Programmers scan their memory for similar context situations and apply the same rule or the rule they do most often.
3435

36+
The model is limited according to the amount of context provided by the features (independent variables). It is also limited to alignments or indentation that are relative to a previous token in the token stream.
37+
3538
## Mechanism
3639

40+
The prediction categories and features used by the kNN model are critical to the success of `CodeBuff`.
41+
42+
### Prediction categories
43+
44+
For a given token and parse tree context (relative to current token), we predict one of:
45+
46+
1. inject *n>=0* newlines
47+
2. inject *n>=0* spaces
48+
3. align with child at *index>=0* of ancestor *delta>=0* steps up parse-tree parent chain
49+
4. indent from first token of ancestor *delta>=0* steps up parse-tree parent chain (the number of spaces per indent is a parameter to the formatter)
50+
5. indent from start of previous line
51+
52+
For efficiency, we use just two classifiers, one for predicting injection of newlines/spaces and one for predicting alignment/indentation. The result of prediction is a tuple:
53+
54+
**predict<sub>ws</sub>**(*context*) &isin; {(newline, *n*), (whitespace, *n*), none}
55+
56+
**predict<sub>align</sub>**(*context*) = &isin; {(align, *delta*, *index*), (indent, *delta*), indent, none}
57+
3758
### Features
3859

60+
matching symbols and list elements
61+
62+
3963
1. INDEX_PREV_TYPE
4064
1. INDEX_PREV_EARLIEST_RIGHT_ANCESTOR
4165
1. INDEX_CUR_TYPE

0 commit comments

Comments
 (0)