You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`dataframe`| Clojask.DataFrame | The operated object ||
219
-
|`trending list`| Collection (seq vector) | Indicates the sort order | Example: ["Salary" "+" "Employee" "-"] means that sort the Salary in ascending order, if equal sort the Employee in descending order |
220
-
|`output-directory`| String | The output path ||
221
-
222
-
**Example**
223
-
224
-
```clojure
225
-
(sort y ["+""Salary"] "resources/sort.csv")
226
-
;; sort by Salary ascendingly
227
-
```
228
-
229
-
230
210
231
-
- compute
232
211
233
-
Compute the result. The pre-defined lazy operations will be executed in pipeline, ie the result of the previous operation becomes the argument of the next operation.
|`dataframe`| Clojask.DataFrame | The operated object ||
238
-
|`num of workers`| int (max 8) | The number of worker instances (except the input and output nodes) | If this argument >= 2, will use [onyx](http://www.onyxplatform.org/) as the distributed platform |
239
-
|`output path`| String | The path of the output csv file | Could exist or not. |
240
-
|[`exception`]| boolean | Whether an exception during calculation will cause termination | Is useful for debugging or detecting empty fields |
241
-
242
-
**Example**
243
-
244
-
```clojure
245
-
(compute x 8"../resources/test.csv":exceptiontrue)
246
-
;; computes all the pre-registered operations
247
-
```
248
-
249
-
250
212
251
213
- inner-join / left-join / right-join
252
214
@@ -258,40 +220,52 @@ You can also group by the combination of keys. (Use the above two rules together
258
220
259
221
*Will automatically pipeline the registered operations and filters like `compute`. You could think of join as first compute the two dataframes then join.*
|`dataframe a`| Clojask.DataFrame | The operated object ||
226
+
|`dataframe b`| Clojask.DataFrame | The operated object ||
227
+
|`a join keys`| String / Collection | The keys of a to be aligned | Find the specification [here](#groupby-keys)|
228
+
|`b join keys`| String / Collection | The keys of b to be aligned | Find the specification [here](#groupby-keys)|
270
229
271
-
**Example**
230
+
**Return**
231
+
232
+
A Clojask.JoinedDataFrame
233
+
234
+
- Unlike Clojask.DataFrame, it only supports three operations:
235
+
-`print-df`
236
+
-`get-col-names`
237
+
-`compute`
238
+
- This means you cannot further apply complicated operations to a joined dataframe. An alternative is to first compute the result, then read it in as a new dataframe.
239
+
240
+
**Example**
272
241
273
242
```clojure
274
243
(defx (dataframe"path/to/a"))
275
244
(defy (dataframe"path/to/b"))
276
245
277
-
(inner-join x y ["col a 1""col a 2"] ["col b 1""col b 2"] 8"path/to/distination":exceptiontrue)
246
+
(defz (inner-join x y ["col a 1""col a 2"] ["col b 1""col b 2"]))
247
+
(compute z 8"path/to/output")
278
248
;; inner join x and y
279
249
280
-
(left-join x y ["col a 1""col a 2"] ["col b 1""col b 2"] 8"path/to/distination":exceptiontrue)
250
+
(defz (left-join x y ["col a 1""col a 2"] ["col b 1""col b 2"]))
251
+
(compute z 8"path/to/output")
281
252
;; left join x and y
282
253
283
-
(right-join x y ["col a 1""col a 2"] ["col b 1""col b 2"] 8"path/to/distination":exceptiontrue)
254
+
(defz (right-join x y ["col a 1""col a 2"] ["col b 1""col b 2"]))
255
+
(compute z 8"path/to/output")
284
256
;; right join x and y
285
257
```
286
258
259
+
260
+
287
261
- reorderCol / renameCol
288
262
289
263
Reorder the columns / rename the column names in the dataframe
|`dataframe`| Clojask.DataFrame | The operated object ||
288
+
|`trending list`| Collection (seq vector) | Indicates the sort order | Example: ["Salary" "+" "Employee" "-"] means that sort the Salary in ascending order, if equal sort the Employee in descending order |
289
+
|`output-directory`| String | The output path ||
290
+
291
+
**Example**
292
+
293
+
```clojure
294
+
(sort y ["+""Salary"] "resources/sort.csv")
295
+
;; sort by Salary ascendingly
296
+
```
297
+
298
+
299
+
300
+
- compute
301
+
302
+
Compute the result. The pre-defined lazy operations will be executed in pipeline, ie the result of the previous operation becomes the argument of the next operation.
|`dataframe`| Clojask.DataFrame | The operated object ||
307
+
|`num of workers`| int (max 8) | The number of worker instances (except the input and output nodes) | Use [onyx](http://www.onyxplatform.org/) as the distributed platform |
308
+
|`output path`| String | The path of the output csv file | Could exist or not. |
309
+
|[`exception`]| boolean | Whether an exception during calculation will cause termination | Is useful for debugging or detecting empty fields |
310
+
|[`select`]| String / Collection of strings | The name of the columns to select. Better to first refer to function `get-col-names` about all the names. (Similar to `SELECT` in sql ) | Can only specify either of select and exclude |
311
+
|[`exclude`]| String / Collection of strings | The name of the columns to exclude | Can only specify either of select and exclude |
312
+
313
+
**Example**
314
+
315
+
```clojure
316
+
(compute x 8"../resources/test.csv":exceptiontrue)
317
+
;; computes all the pre-registered operations
318
+
319
+
(compute x 8"../resources/test.csv":select"col a")
320
+
;; only select column a
321
+
322
+
(compute x 8"../resources/test.csv":select ["col b""col a"])
323
+
;; select two columns, column b and column a in order
324
+
325
+
(compute x 8"../resources/test.csv":exclude ["col b""col a"])
326
+
;; select all columns except column b and column a, other columns are in order
0 commit comments