You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/02_Introduction_into_Metafacture-Flux.md
+23-24Lines changed: 23 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,7 @@ See the result below? It is `Hello, friend. I'am Metafacture!`.
38
38
But what have we done here?
39
39
We have a short text string `"Hello, friend. I'am Metafacture"`. That is printed with the modul `print`.
40
40
41
-
A Metafacture Workflow is nothing else than an incoming text string that is manipulated by one or multiple moduls that do something with the incoming string.
41
+
A Metafacture Workflow is nothing else than an incoming text string that is manipulated by one or multiple modules that do something with the incoming string.
42
42
However, the workflow does not have to start with a text string but can also be a variable that stands for the text string and needs to be defined before the workflow. As this:
43
43
44
44
```text
@@ -93,8 +93,7 @@ inputFile
93
93
```
94
94
95
95
The inputFile is opened as a file (`open-file`) and then processed line by line (`as-line`).
96
-
You can see that in this [sample](https://metafacture.org/playground/?flux=inputFile%0A%7Copen-file%0A%7Cas-lines%0A%7Cprint%0A%3B&data=Hello%2C+friend.+I%27am+Metafacture%21).
Have a look at this [sample](https://metafacture.org/playground/?flux=inputFile%0A%7Copen-file%0A%7Cas-lines%0A%7Cprint%0A%3B&data=Hello%2C+friend.+I%27am+Metafacture%21).
98
97
99
98
We usually do not start with any random text strings but with data. So lets play around with some data.
100
99
@@ -108,7 +107,7 @@ You will see data that look like this:
108
107
109
108
This is data in JSON format. But it seems not very readable.
110
109
111
-
But all these fields tell something about a publication, a book, with 268 pages and title Ordinary Vices by Judith N. Shklar.
110
+
All these fields tell us something about a publication, a book, with 268 pages and title "Ordinary Vices" by Judith N. Shklar.
112
111
113
112
Let's copy the JSON data into our `ìnputFile-content` field. [And run it again](https://metafacture.org/playground/?flux=inputFile%0A%7Copen-file%0A%7Cas-lines%0A%7Cprint%0A%3B&data=%7B%22publishers%22%3A+%5B%22Belknap+Press+of+Harvard+University+Press%22%5D%2C+%22identifiers%22%3A+%7B%22librarything%22%3A+%5B%22321843%22%5D%2C+%22goodreads%22%3A+%5B%222439014%22%5D%7D%2C+%22covers%22%3A+%5B413726%5D%2C+%22local_id%22%3A+%5B%22urn%3Atrent%3A0116301499939%22%2C+%22urn%3Asfpl%3A31223009984353%22%2C+%22urn%3Asfpl%3A31223011345064%22%2C+%22urn%3Acst%3A10017055762%22%5D%2C+%22lc_classifications%22%3A+%5B%22JA79+.S44+1984%22%2C+%22HM216+.S44%22%2C+%22JA79.S44+1984%22%5D%2C+%22key%22%3A+%22/books/OL2838758M%22%2C+%22authors%22%3A+%5B%7B%22key%22%3A+%22/authors/OL381196A%22%7D%5D%2C+%22ocaid%22%3A+%22ordinaryvices0000shkl%22%2C+%22publish_places%22%3A+%5B%22Cambridge%2C+Mass%22%5D%2C+%22subjects%22%3A+%5B%22Political+ethics.%22%2C+%22Liberalism.%22%2C+%22Vices.%22%5D%2C+%22pagination%22%3A+%22268+p.+%3B%22%2C+%22source_records%22%3A+%5B%22marc%3AOpenLibraries-Trent-MARCs/tier5.mrc%3A4020092%3A744%22%2C+%22marc%3Amarc_openlibraries_sanfranciscopubliclibrary/sfpl_chq_2018_12_24_run01.mrc%3A195791766%3A1651%22%2C+%22ia%3Aordinaryvices0000shkl%22%2C+%22marc%3Amarc_claremont_school_theology/CSTMARC1_barcode.mrc%3A137174387%3A3955%22%2C+%22bwb%3A9780674641754%22%2C+%22marc%3Amarc_loc_2016/BooksAll.2016.part15.utf8%3A115755952%3A680%22%2C+%22marc%3Amarc_claremont_school_theology/CSTMARC1_multibarcode.mrc%3A137367696%3A3955%22%2C+%22ia%3Aordinaryvices0000shkl_a5g0%22%2C+%22marc%3Amarc_columbia/Columbia-extract-20221130-001.mrc%3A328870555%3A1311%22%2C+%22marc%3Aharvard_bibliographic_metadata/ab.bib.01.20150123.full.mrc%3A156768969%3A815%22%5D%2C+%22title%22%3A+%22Ordinary+vices%22%2C+%22dewey_decimal_class%22%3A+%5B%22172%22%5D%2C+%22notes%22%3A+%7B%22type%22%3A+%22/type/text%22%2C+%22value%22%3A+%22Bibliography%3A+p.+251-260.\nIncludes+index.%22%7D%2C+%22number_of_pages%22%3A+268%2C+%22languages%22%3A+%5B%7B%22key%22%3A+%22/languages/eng%22%7D%5D%2C+%22lccn%22%3A+%5B%2284000531%22%5D%2C+%22isbn_10%22%3A+%5B%220674641752%22%5D%2C+%22publish_date%22%3A+%221984%22%2C+%22publish_country%22%3A+%22mau%22%2C+%22by_statement%22%3A+%22Judith+N.+Shklar.%22%2C+%22works%22%3A+%5B%7B%22key%22%3A+%22/works/OL2617047W%22%7D%5D%2C+%22type%22%3A+%7B%22key%22%3A+%22/type/edition%22%7D%2C+%22oclc_numbers%22%3A+%5B%2210348450%22%5D%2C+%22latest_revision%22%3A+16%2C+%22revision%22%3A+16%2C+%22created%22%3A+%7B%22type%22%3A+%22/type/datetime%22%2C+%22value%22%3A+%222008-04-01T03%3A28%3A50.625462%22%7D%2C+%22last_modified%22%3A+%7B%22type%22%3A+%22/type/datetime%22%2C+%22value%22%3A+%222024-12-27T16%3A46%3A50.181109%22%7D%7D).
114
113
@@ -117,14 +116,12 @@ The output in result is the same as the input and it is still not very readable.
117
116
Lets turn the one line of JSON data into YAML. YAML is another format for structured information which is a bit easier to read for human eyes.
118
117
In order to change the serialization of the data we need to decode the data and then encode the data.
119
118
120
-
Metafacture has lots of decoder and encoder modules for specific data formats that can be used in an Flux workflow.
119
+
Metafacture has lots of decoder and encoder modules for specific data formats that can be used in a Flux workflow.
121
120
122
121
Let's try this out. Add the module `decode-json` and `encode-yaml` to your Flux Workflow.
123
122
124
123
The Flux should now look like this:
125
124
126
-
Flux:
127
-
128
125
```text
129
126
inputFile
130
127
| open-file
@@ -217,7 +214,7 @@ Luckily, we cannot only open the data we have in our `inputFile-content` field,
217
214
218
215
Clear your playground and copy the following Flux workflow:
219
216
220
-
```
217
+
```text
221
218
"https://openlibrary.org/books/OL2838758M.json"
222
219
| open-http
223
220
| as-lines
@@ -227,22 +224,24 @@ Clear your playground and copy the following Flux workflow:
227
224
;
228
225
```
229
226
230
-
The [result in the playground](https://metafacture.org/playground/?flux=%22https%3A//openlibrary.org/books/OL2838758M.json%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-json%0A%7C+encode-yaml%0A%7C+print%0A%3B) should be the same as before without having to paste anything into the text field. We just used the module `open-http` and directly retrieved the data from the URL.
227
+
The [result in the playground](https://metafacture.org/playground/?flux=%22https%3A//openlibrary.org/books/OL2838758M.json%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-json%0A%7C+encode-yaml%0A%7C+print%0A%3B) should be the same as before without having to paste anything into the text field. We just used the module `open-http` to directly retrieve the data from the URL.
231
228
232
-
Let's take a look what a Flux workflow does. The Flux workflow is combination of different moduls to process incoming structured data. In our example we have different things that we do with these modules:
229
+
Let's take a look at what a Flux workflow does. The Flux workflow is a combination of different modules to process incoming structured data. In our example we have different things that we do with these modules:
233
230
234
231
1. We have a URL as input. The URL localizes the data on the web.
235
-
2. We tell Metafacture to request the stated url using `open-http`.
232
+
2. We tell Metafacture to request the stated URL using `open-http`.
236
233
3. Then we define how to handle the incoming data: since the JSON is written in one line, we tell Metafacture to regard every new line as a new record with `as-lines`
237
-
4. Afterwards we tell Metafacture to `decode-json` in order to translate the incoming data as json to the generic internal data model that is called metadata events
234
+
4. Afterwards we tell Metafacture to `decode-json` in order to translate the incoming data as JSON to the generic internal data model that is called metadata events
238
235
5. Then we instruct Metafacture to serialize the metadata events as YAML with `encode-yaml`
239
236
6. Finally, we tell MF to `print` everything.
240
237
241
-
So let's have a small recap of what we done and learned so far: * We played around with the Metafacture Playground.
242
-
* We learned that a Metafacture Flux workflow is a combination of modules with an inital text string or an variable.
238
+
So let's have a small recap of what we've done and learned so far:
239
+
240
+
* We've played around with the Metafacture Playground.
241
+
* We've learned that a Metafacture Flux workflow is a combination of modules with an inital text string or a variable.
243
242
* We got to know different modules like `open-http`, `as-lines`. `decode-json`, `encode-yaml`, `print`
244
243
245
-
More modules can be found in the [documentation of available flux commands](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.html).
244
+
More modules can be found in the [documentation of available flux commands](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html).
246
245
247
246
Now take some time and play around a little bit more and use some other modules.
248
247
@@ -268,16 +267,16 @@ Now take some time and play around a little bit more and use some other modules.
268
267
What you see with the modules `encode-formeta` and `write` is that modules can have further specification in brackets.
269
268
These can eiter be a string in `"..."` or attributes that define options as with `style=`.
270
269
271
-
One last thing you should learn on an abstract level is to grasp the general idea of Metafacture Flux workflows is that they have many different moduls through which the data is flowing.
272
-
The most abstract and most common process resemble the following steps:
270
+
One last thing you should learn on an abstract level to grasp the general idea of Metafacture Flux workflows is that they have many different modules through which the data is flowing.
271
+
The most abstract and most common process resembles the following steps:
This process is one that transforms incoming data in a way that is changed at the end.
275
+
This process chain transforms incoming data in distinct steps.
277
276
Each step can be done by one or a combination of multiple modules.
278
277
Modules are small tools that do parts of the complete task we want to do.
279
278
280
-
Each modul demands a certain input and give a certain output. This is called signature.
279
+
Each modul demands a certain input and gives a certain output. This is called signature.
281
280
e.g.:
282
281
283
282
The first modul `open-file` expects a string and provides read data (called reader).
@@ -286,12 +285,12 @@ This reader data can be passed on to a modul that accepts reader data e.g. in ou
286
285
287
286
If you have a look at the flux modul/command documentation then you see under signature which data a modul expects and which data it outputs.
288
287
289
-
The combination of moduls is a Flux workflow.
288
+
The combination of modules is called a "Flux workflow".
290
289
291
290
Each module is separated by a `|` and every workflow ends with a `;`.
292
291
Comments can be added with `//`.
293
292
294
-
See:
293
+
For example:
295
294
296
295
```
297
296
//input string:
@@ -319,7 +318,7 @@ Add the option: <code>prettyPrinting="true"</code> to the <code>encode-json</cod
319
318
320
319
321
320
322
-
2) Have a look at documentation of [`decode-xml`](https://metafacture.org/metafacture-documentation/docs/flux/flux-commands.html#decode-xml) what is different to `decode-json`? And what input does it expect and what output does it create (Hint: signature)?
321
+
2) Have a look at the documentation of [`decode-xml`](https://metafacture.org/metafacture-documentation/docs/flux/flux-commands.html#decode-xml). What is different to `decode-json`? And what input does it expect and what output does it create (hint: signature)?
323
322
324
323
<details>
325
324
<summary>Answer</summary>
@@ -329,7 +328,7 @@ The signature of <code>decode-xml</code> and <code>decode-json</code> is quiet d
<code>decode-xml</code> expects data from Reader output of <code>open-file</code> or <code>open-http</code>, and creates output that can be transformed by a specific xml <code>handler</code>. The xml parser of <code>decode-xml</code> works straight with read content of a file or a url.
331
+
<code>decode-xml</code> expects data from Reader output of <code>open-file</code> or <code>open-http</code>, and creates output that can be transformed by a specific XML <code>handler</code>. The XML parser of <code>decode-xml</code> works straight by reading the content of a file or a URL.
333
332
334
333
<code>decode-json</code> expects data from output of a string like <code>as-lines</code> or <code>as-records</code> and creates output that could be transformed by <code>fix</code> or encoded with a module like <code>encode-xml</code>. For the most decoding you have to specify how (<code>as-lines</code> or <code>as-records</code>) the incoming data is read.
335
334
</details>
@@ -354,7 +353,7 @@ Explanation:
354
353
355
354
As you surely already saw I mentioned transform as one step in a metafacture workflow.
356
355
357
-
But aside from changing the serialisation we did not play around with transformations yet.
356
+
But aside from changing the serialization we did not play around with transformations yet.
Copy file name to clipboardExpand all lines: docs/03_Introduction_into_Metafacture-Fix.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,15 +7,15 @@ parent: Tutorial
7
7
8
8
# Lesson 3: Introduction into Metafacture Fix
9
9
10
-
In the last session we learned about Flux moduls.
11
-
Flux moduls can do a lot of things. They configure the "high-level" transformation pipeline.
10
+
In the last session we've learned about Flux modules.
11
+
Flux modules can do a lot of things. They configure the "high-level" transformation pipeline.
12
12
13
-
But the main transformation of incoming data at record, elemenet and value level is usually done by the transformation moduls[Fix](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#fix) or [Morph](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#morph) as one step in the pipeline.
13
+
But the main transformation of incoming data at record, element and value level is usually done by the transformation modules[Fix](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#fix) or [Morph](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#morph) as one step in the pipeline.
14
14
15
15
By transformation we mean things like:
16
16
17
17
* Manipulating element names and element values
18
-
*Change hierachies and structures of records
18
+
*Changing hierachies and structures of records
19
19
* Lookup values in concordance list
20
20
21
21
But not changing serialization that is part of encoding and decoding.
@@ -47,10 +47,10 @@ You should end up with something like:
47
47
title: "Ordinary vices"
48
48
```
49
49
50
-
The Fix module, called by `fix`, in Metafacture is used to manipulate the input data filtering fields we would like to see. Only one fix-function was used: `retain`, which throws away all the data from the input except the stated `"title"` field. Normally all incoming data is passed through, unless it is somehow manipulated or a `retain` function is used.
50
+
The Fix module, called by `fix`, is used to manipulate the input data filtering fields we would like to see. Only one Fix-function was used: `retain`, which throws away all the data from the input except the stated `"title"` field. Normally all incoming data is passed through, unless it is somehow manipulated or a `retain` function is used.
51
51
52
-
HINT: As long as you embedd the fix functions in the Flux Workflow, you have to use double quotes to fence the fix functions,
53
-
and single quotes in the fix functions. As we did here: `fix ("retain('title')")`
52
+
HINT: As long as you embed the Fix functions in the Flux Workflow, you have to use double quotes to fence the Fix functions,
53
+
and single quotes in the Fix functions. As we did here: `fix ("retain('title')")`
54
54
55
55
Now let us additionally keep the info that is given in the element `"publish_date"` and the subfield `"key"` in `'type'` by adding `'publish_date', 'type.key'` to `retain`:
56
56
@@ -76,9 +76,9 @@ notes:
76
76
77
77
```
78
78
79
-
When manipulating data you often need to create many fixes to process a data file in the format and structure you need. With a text editor you can write all fix functions in a singe separate Fix file.
79
+
When manipulating data you often need to create many Fixes to process a data file in the format and structure you need. With a text editor you can write all Fix functions in a singe separate Fix file.
80
80
81
-
The playground has an transformationFile-content area that can be used as if the Fix is in a separate file.
81
+
The playground has a transformationFile-content area that can be used as if the Fix is in a separate file.
82
82
In the playground we use the variable `transformationFile` to adress the Fix file in the playground.
2)[Add a field with todays date called `"map_date"`.](https://metafacture.org/playground/?flux=%22https%3A//openlibrary.org/books/OL2838758M.json%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-json%0A%7C+fix+%28transformationFile%29%0A%7C+encode-yaml%0A%7C+print%0A%3B&transformation=move_field%28%22type.key%22%2C%22pub_type%22%29%0Areplace_all%28%22pub_type%22%2C%22/type/%22%2C%22%22%29%0A...%28%22mape_date%22%2C%22...%22%29%0Aretain%28%22title%22%2C+%22publish_date%22%2C+%22by_statement%22%2C+%22pub_type%22%29)
171
171
172
-
Have a look at the fix functions: https://metafacture.org/metafacture-documentation/docs/fix/Fix-functions.html (Hint: you could use `add_field` or `timestamp`. And don't forget to add the new element to `retain`)
172
+
Have a look at the [Fix functions](https://metafacture.org/metafacture-documentation/docs/fix/Fix-functions.html). (Hint: you could use `add_field` or `timestamp`. And don't forget to add the new element to `retain`)
0 commit comments