Commit e39f13b
authored
Create LakeFS commits RDD directly without using an input format (#9657)
* Create LakeFS commits RDD directly without using an input format
Garbage collection (and all other uses) use LakeFSContext.newRDD to create
the "ranges RDD". Creating them explicitly with Spark operators means Spark
can parallelize reading all metaranges and ranges.
<h2>How much faster?</h2>
I have a small repo with many small commits. I enabled GC for it. Here are
summaries from two sample mark-only runs.
<h3>Direct RDD (this code)</h3>
Runtime: 2m33s
```json
{
"run_id": "g4uk6erfnfus73frbnqg",
"success": true,
"first_slice": "g5adr8f5pvec73cpia80",
"start_time": "2025-11-10T10:34:37.245361091Z",
"cutoff_time": "2025-11-10T04:34:37.243Z",
"num_deleted_objects": 147942
}
```
<h3>File format RDD (previous code)</h3>
Runtime: 3m52s
```json
{
"run_id": "g4uinaarakss73aoeel0",
"success": true,
"first_slice": "g5adr8f5pvec73cpia80",
"start_time": "2025-11-10T12:15:11.097697745Z",
"cutoff_time": "2025-11-10T06:15:11.096Z",
"num_deleted_objects": 147942
}
```
<h3>Summary</h3>
- The same number of objects were marked for deletion.
- The _same_ objects were marked for deletion on both.
- New code takes 0.65 the time of the old code.
* scalafmt
D'oh.
* [bug] Delete file, close SSTableReader after iterating
All errors during closing are _logged_ but do not fail the task: these are
readonly objects, so bad closes can do no more than leak (on "reasonable"
systems).
Flagged by **Copilot**, hurrah for verifiable actionable suggestions!
* Also parallelize directory listing
Read objects in parallel:
- from directory listing;
- from commits
Default parallelism is no good for either of these, because it is based on #
of CPUs - and we want a _lot_ more.
New configuration option `lakefs.job.range_read_parallelism` configures
this parallelism.
* scalafmt1 parent 8232830 commit e39f13b
File tree
6 files changed
+111
-23
lines changed- clients/spark/src
- main/scala/io/treeverse
- clients
- gc
- test/scala/io/treeverse/gc
6 files changed
+111
-23
lines changedLines changed: 83 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
| 12 | + | |
11 | 13 | | |
| 14 | + | |
12 | 15 | | |
13 | 16 | | |
14 | 17 | | |
| |||
61 | 64 | | |
62 | 65 | | |
63 | 66 | | |
| 67 | + | |
| 68 | + | |
64 | 69 | | |
65 | 70 | | |
66 | 71 | | |
| |||
71 | 76 | | |
72 | 77 | | |
73 | 78 | | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
74 | 82 | | |
75 | 83 | | |
76 | 84 | | |
| |||
108 | 116 | | |
109 | 117 | | |
110 | 118 | | |
| 119 | + | |
| 120 | + | |
111 | 121 | | |
112 | 122 | | |
113 | 123 | | |
114 | 124 | | |
115 | | - | |
116 | | - | |
117 | | - | |
| 125 | + | |
118 | 126 | | |
119 | | - | |
120 | 127 | | |
121 | 128 | | |
122 | 129 | | |
| |||
131 | 138 | | |
132 | 139 | | |
133 | 140 | | |
134 | | - | |
135 | | - | |
136 | | - | |
137 | | - | |
138 | | - | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
139 | 149 | | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
140 | 212 | | |
141 | 213 | | |
142 | 214 | | |
| |||
Lines changed: 4 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
169 | 169 | | |
170 | 170 | | |
171 | 171 | | |
172 | | - | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
173 | 176 | | |
174 | 177 | | |
175 | 178 | | |
| |||
Lines changed: 10 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
| |||
101 | 102 | | |
102 | 103 | | |
103 | 104 | | |
| 105 | + | |
| 106 | + | |
104 | 107 | | |
105 | 108 | | |
106 | 109 | | |
| |||
110 | 113 | | |
111 | 114 | | |
112 | 115 | | |
113 | | - | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
114 | 123 | | |
115 | 124 | | |
116 | 125 | | |
| |||
Lines changed: 5 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| 66 | + | |
66 | 67 | | |
67 | 68 | | |
68 | 69 | | |
| |||
Lines changed: 6 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
| |||
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| 47 | + | |
| 48 | + | |
46 | 49 | | |
47 | 50 | | |
48 | 51 | | |
| |||
54 | 57 | | |
55 | 58 | | |
56 | 59 | | |
57 | | - | |
| 60 | + | |
58 | 61 | | |
59 | 62 | | |
60 | 63 | | |
| |||
65 | 68 | | |
66 | 69 | | |
67 | 70 | | |
68 | | - | |
| 71 | + | |
69 | 72 | | |
70 | 73 | | |
71 | 74 | | |
| |||
195 | 198 | | |
196 | 199 | | |
197 | 200 | | |
198 | | - | |
| 201 | + | |
199 | 202 | | |
200 | 203 | | |
201 | 204 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
55 | | - | |
| 55 | + | |
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| |||
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
84 | | - | |
| 84 | + | |
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
| |||
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
130 | | - | |
| 130 | + | |
131 | 131 | | |
132 | 132 | | |
133 | 133 | | |
| |||
0 commit comments