Skip to content

Commit 7fb0b04

Browse files
committed
Merge #21 from remote-tracking branch 'origin/improveABit'
2 parents 408f558 + 92acde0 commit 7fb0b04

11 files changed

+226
-221
lines changed

_config.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@ title: Metafacture Tutorial
22
description: This is a tutorial to Metafacture.
33
theme: just-the-docs
44

5-
url: https://metafacture.github.io/metafacture-documentation
5+
url: https://metafacture.github.io/metafacture-tutorial
66

77
aux_links:
8-
Metafacture Documentation on Github: https://github.com/metafacture/metafacture-tutorial
8+
Metafacture Tutorial on Github: https://github.com/metafacture/metafacture-tutorial
99

1010
# External navigation links
1111
nav_external_links:

docs/02_Introduction_into_Metafacture-Flux.md

Lines changed: 23 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ See the result below? It is `Hello, friend. I'am Metafacture!`.
3838
But what have we done here?
3939
We have a short text string `"Hello, friend. I'am Metafacture"`. That is printed with the modul `print`.
4040

41-
A Metafacture Workflow is nothing else than an incoming text string that is manipulated by one or multiple moduls that do something with the incoming string.
41+
A Metafacture Workflow is nothing else than an incoming text string that is manipulated by one or multiple modules that do something with the incoming string.
4242
However, the workflow does not have to start with a text string but can also be a variable that stands for the text string and needs to be defined before the workflow. As this:
4343

4444
```text
@@ -93,8 +93,7 @@ inputFile
9393
```
9494

9595
The inputFile is opened as a file (`open-file`) and then processed line by line (`as-line`).
96-
You can see that in this [sample](https://metafacture.org/playground/?flux=inputFile%0A%7Copen-file%0A%7Cas-lines%0A%7Cprint%0A%3B&data=Hello%2C+friend.+I%27am+Metafacture%21).
97-
https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+as-lines%0A%7C+print%0A%3B&data=Hello%2C+friend.+I%27am+Metafacture%21
96+
Have a look at this [sample](https://metafacture.org/playground/?flux=inputFile%0A%7Copen-file%0A%7Cas-lines%0A%7Cprint%0A%3B&data=Hello%2C+friend.+I%27am+Metafacture%21).
9897

9998
We usually do not start with any random text strings but with data. So lets play around with some data.
10099

@@ -108,7 +107,7 @@ You will see data that look like this:
108107

109108
This is data in JSON format. But it seems not very readable.
110109

111-
But all these fields tell something about a publication, a book, with 268 pages and title Ordinary Vices by Judith N. Shklar.
110+
All these fields tell us something about a publication, a book, with 268 pages and title "Ordinary Vices" by Judith N. Shklar.
112111

113112
Let's copy the JSON data into our `ìnputFile-content` field. [And run it again](https://metafacture.org/playground/?flux=inputFile%0A%7Copen-file%0A%7Cas-lines%0A%7Cprint%0A%3B&data=%7B%22publishers%22%3A+%5B%22Belknap+Press+of+Harvard+University+Press%22%5D%2C+%22identifiers%22%3A+%7B%22librarything%22%3A+%5B%22321843%22%5D%2C+%22goodreads%22%3A+%5B%222439014%22%5D%7D%2C+%22covers%22%3A+%5B413726%5D%2C+%22local_id%22%3A+%5B%22urn%3Atrent%3A0116301499939%22%2C+%22urn%3Asfpl%3A31223009984353%22%2C+%22urn%3Asfpl%3A31223011345064%22%2C+%22urn%3Acst%3A10017055762%22%5D%2C+%22lc_classifications%22%3A+%5B%22JA79+.S44+1984%22%2C+%22HM216+.S44%22%2C+%22JA79.S44+1984%22%5D%2C+%22key%22%3A+%22/books/OL2838758M%22%2C+%22authors%22%3A+%5B%7B%22key%22%3A+%22/authors/OL381196A%22%7D%5D%2C+%22ocaid%22%3A+%22ordinaryvices0000shkl%22%2C+%22publish_places%22%3A+%5B%22Cambridge%2C+Mass%22%5D%2C+%22subjects%22%3A+%5B%22Political+ethics.%22%2C+%22Liberalism.%22%2C+%22Vices.%22%5D%2C+%22pagination%22%3A+%22268+p.+%3B%22%2C+%22source_records%22%3A+%5B%22marc%3AOpenLibraries-Trent-MARCs/tier5.mrc%3A4020092%3A744%22%2C+%22marc%3Amarc_openlibraries_sanfranciscopubliclibrary/sfpl_chq_2018_12_24_run01.mrc%3A195791766%3A1651%22%2C+%22ia%3Aordinaryvices0000shkl%22%2C+%22marc%3Amarc_claremont_school_theology/CSTMARC1_barcode.mrc%3A137174387%3A3955%22%2C+%22bwb%3A9780674641754%22%2C+%22marc%3Amarc_loc_2016/BooksAll.2016.part15.utf8%3A115755952%3A680%22%2C+%22marc%3Amarc_claremont_school_theology/CSTMARC1_multibarcode.mrc%3A137367696%3A3955%22%2C+%22ia%3Aordinaryvices0000shkl_a5g0%22%2C+%22marc%3Amarc_columbia/Columbia-extract-20221130-001.mrc%3A328870555%3A1311%22%2C+%22marc%3Aharvard_bibliographic_metadata/ab.bib.01.20150123.full.mrc%3A156768969%3A815%22%5D%2C+%22title%22%3A+%22Ordinary+vices%22%2C+%22dewey_decimal_class%22%3A+%5B%22172%22%5D%2C+%22notes%22%3A+%7B%22type%22%3A+%22/type/text%22%2C+%22value%22%3A+%22Bibliography%3A+p.+251-260.\nIncludes+index.%22%7D%2C+%22number_of_pages%22%3A+268%2C+%22languages%22%3A+%5B%7B%22key%22%3A+%22/languages/eng%22%7D%5D%2C+%22lccn%22%3A+%5B%2284000531%22%5D%2C+%22isbn_10%22%3A+%5B%220674641752%22%5D%2C+%22publish_date%22%3A+%221984%22%2C+%22publish_country%22%3A+%22mau%22%2C+%22by_statement%22%3A+%22Judith+N.+Shklar.%22%2C+%22works%22%3A+%5B%7B%22key%22%3A+%22/works/OL2617047W%22%7D%5D%2C+%22type%22%3A+%7B%22key%22%3A+%22/type/edition%22%7D%2C+%22oclc_numbers%22%3A+%5B%2210348450%22%5D%2C+%22latest_revision%22%3A+16%2C+%22revision%22%3A+16%2C+%22created%22%3A+%7B%22type%22%3A+%22/type/datetime%22%2C+%22value%22%3A+%222008-04-01T03%3A28%3A50.625462%22%7D%2C+%22last_modified%22%3A+%7B%22type%22%3A+%22/type/datetime%22%2C+%22value%22%3A+%222024-12-27T16%3A46%3A50.181109%22%7D%7D).
114113

@@ -117,14 +116,12 @@ The output in result is the same as the input and it is still not very readable.
117116
Lets turn the one line of JSON data into YAML. YAML is another format for structured information which is a bit easier to read for human eyes.
118117
In order to change the serialization of the data we need to decode the data and then encode the data.
119118

120-
Metafacture has lots of decoder and encoder modules for specific data formats that can be used in an Flux workflow.
119+
Metafacture has lots of decoder and encoder modules for specific data formats that can be used in a Flux workflow.
121120

122121
Let's try this out. Add the module `decode-json` and `encode-yaml` to your Flux Workflow.
123122

124123
The Flux should now look like this:
125124

126-
Flux:
127-
128125
```text
129126
inputFile
130127
| open-file
@@ -217,7 +214,7 @@ Luckily, we cannot only open the data we have in our `inputFile-content` field,
217214

218215
Clear your playground and copy the following Flux workflow:
219216

220-
```
217+
```text
221218
"https://openlibrary.org/books/OL2838758M.json"
222219
| open-http
223220
| as-lines
@@ -227,22 +224,24 @@ Clear your playground and copy the following Flux workflow:
227224
;
228225
```
229226

230-
The [result in the playground](https://metafacture.org/playground/?flux=%22https%3A//openlibrary.org/books/OL2838758M.json%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-json%0A%7C+encode-yaml%0A%7C+print%0A%3B) should be the same as before without having to paste anything into the text field. We just used the module `open-http` and directly retrieved the data from the URL.
227+
The [result in the playground](https://metafacture.org/playground/?flux=%22https%3A//openlibrary.org/books/OL2838758M.json%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-json%0A%7C+encode-yaml%0A%7C+print%0A%3B) should be the same as before without having to paste anything into the text field. We just used the module `open-http` to directly retrieve the data from the URL.
231228

232-
Let's take a look what a Flux workflow does. The Flux workflow is combination of different moduls to process incoming structured data. In our example we have different things that we do with these modules:
229+
Let's take a look at what a Flux workflow does. The Flux workflow is a combination of different modules to process incoming structured data. In our example we have different things that we do with these modules:
233230

234231
1. We have a URL as input. The URL localizes the data on the web.
235-
2. We tell Metafacture to request the stated url using `open-http`.
232+
2. We tell Metafacture to request the stated URL using `open-http`.
236233
3. Then we define how to handle the incoming data: since the JSON is written in one line, we tell Metafacture to regard every new line as a new record with `as-lines`
237-
4. Afterwards we tell Metafacture to `decode-json` in order to translate the incoming data as json to the generic internal data model that is called metadata events
234+
4. Afterwards we tell Metafacture to `decode-json` in order to translate the incoming data as JSON to the generic internal data model that is called metadata events
238235
5. Then we instruct Metafacture to serialize the metadata events as YAML with `encode-yaml`
239236
6. Finally, we tell MF to `print` everything.
240237

241-
So let's have a small recap of what we done and learned so far: * We played around with the Metafacture Playground.
242-
* We learned that a Metafacture Flux workflow is a combination of modules with an inital text string or an variable.
238+
So let's have a small recap of what we've done and learned so far:
239+
240+
* We've played around with the Metafacture Playground.
241+
* We've learned that a Metafacture Flux workflow is a combination of modules with an inital text string or a variable.
243242
* We got to know different modules like `open-http`, `as-lines`. `decode-json`, `encode-yaml`, `print`
244243

245-
More modules can be found in the [documentation of available flux commands](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.html).
244+
More modules can be found in the [documentation of available flux commands](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html).
246245

247246
Now take some time and play around a little bit more and use some other modules.
248247

@@ -268,16 +267,16 @@ Now take some time and play around a little bit more and use some other modules.
268267
What you see with the modules `encode-formeta` and `write` is that modules can have further specification in brackets.
269268
These can eiter be a string in `"..."` or attributes that define options as with `style=`.
270269

271-
One last thing you should learn on an abstract level is to grasp the general idea of Metafacture Flux workflows is that they have many different moduls through which the data is flowing.
272-
The most abstract and most common process resemble the following steps:
270+
One last thing you should learn on an abstract level to grasp the general idea of Metafacture Flux workflows is that they have many different modules through which the data is flowing.
271+
The most abstract and most common process resembles the following steps:
273272

274273
**→ read → decode → transform → encode → write →**
275274

276-
This process is one that transforms incoming data in a way that is changed at the end.
275+
This process chain transforms incoming data in distinct steps.
277276
Each step can be done by one or a combination of multiple modules.
278277
Modules are small tools that do parts of the complete task we want to do.
279278

280-
Each modul demands a certain input and give a certain output. This is called signature.
279+
Each modul demands a certain input and gives a certain output. This is called signature.
281280
e.g.:
282281

283282
The first modul `open-file` expects a string and provides read data (called reader).
@@ -286,12 +285,12 @@ This reader data can be passed on to a modul that accepts reader data e.g. in ou
286285

287286
If you have a look at the flux modul/command documentation then you see under signature which data a modul expects and which data it outputs.
288287

289-
The combination of moduls is a Flux workflow.
288+
The combination of modules is called a "Flux workflow".
290289

291290
Each module is separated by a `|` and every workflow ends with a `;`.
292291
Comments can be added with `//`.
293292

294-
See:
293+
For example:
295294

296295
```
297296
//input string:
@@ -319,7 +318,7 @@ Add the option: <code>prettyPrinting="true"</code> to the <code>encode-json</cod
319318

320319

321320

322-
2) Have a look at documentation of [`decode-xml`](https://metafacture.org/metafacture-documentation/docs/flux/flux-commands.html#decode-xml) what is different to `decode-json`? And what input does it expect and what output does it create (Hint: signature)?
321+
2) Have a look at the documentation of [`decode-xml`](https://metafacture.org/metafacture-documentation/docs/flux/flux-commands.html#decode-xml). What is different to `decode-json`? And what input does it expect and what output does it create (hint: signature)?
323322

324323
<details>
325324
<summary>Answer</summary>
@@ -329,7 +328,7 @@ The signature of <code>decode-xml</code> and <code>decode-json</code> is quiet d
329328
<code>decode-json</code>: signature: String -> StreamReceiver
330329

331330
Explanation:
332-
<code>decode-xml</code> expects data from Reader output of <code>open-file</code> or <code>open-http</code>, and creates output that can be transformed by a specific xml <code>handler</code>. The xml parser of <code>decode-xml</code> works straight with read content of a file or a url.
331+
<code>decode-xml</code> expects data from Reader output of <code>open-file</code> or <code>open-http</code>, and creates output that can be transformed by a specific XML <code>handler</code>. The XML parser of <code>decode-xml</code> works straight by reading the content of a file or a URL.
333332

334333
<code>decode-json</code> expects data from output of a string like <code>as-lines</code> or <code>as-records</code> and creates output that could be transformed by <code>fix</code> or encoded with a module like <code>encode-xml</code>. For the most decoding you have to specify how (<code>as-lines</code> or <code>as-records</code>) the incoming data is read.
335334
</details>
@@ -354,7 +353,7 @@ Explanation:
354353

355354
As you surely already saw I mentioned transform as one step in a metafacture workflow.
356355

357-
But aside from changing the serialisation we did not play around with transformations yet.
356+
But aside from changing the serialization we did not play around with transformations yet.
358357
This will be the theme of the next session.
359358

360359
---------------

docs/03_Introduction_into_Metafacture-Fix.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,15 @@ parent: Tutorial
77

88
# Lesson 3: Introduction into Metafacture Fix
99

10-
In the last session we learned about Flux moduls.
11-
Flux moduls can do a lot of things. They configure the "high-level" transformation pipeline.
10+
In the last session we've learned about Flux modules.
11+
Flux modules can do a lot of things. They configure the "high-level" transformation pipeline.
1212

13-
But the main transformation of incoming data at record, elemenet and value level is usually done by the transformation moduls [Fix](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#fix) or [Morph](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#morph) as one step in the pipeline.
13+
But the main transformation of incoming data at record, element and value level is usually done by the transformation modules [Fix](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#fix) or [Morph](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#morph) as one step in the pipeline.
1414

1515
By transformation we mean things like:
1616

1717
* Manipulating element names and element values
18-
* Change hierachies and structures of records
18+
* Changing hierachies and structures of records
1919
* Lookup values in concordance list
2020

2121
But not changing serialization that is part of encoding and decoding.
@@ -47,10 +47,10 @@ You should end up with something like:
4747
title: "Ordinary vices"
4848
```
4949
50-
The Fix module, called by `fix`, in Metafacture is used to manipulate the input data filtering fields we would like to see. Only one fix-function was used: `retain`, which throws away all the data from the input except the stated `"title"` field. Normally all incoming data is passed through, unless it is somehow manipulated or a `retain` function is used.
50+
The Fix module, called by `fix`, is used to manipulate the input data filtering fields we would like to see. Only one Fix-function was used: `retain`, which throws away all the data from the input except the stated `"title"` field. Normally all incoming data is passed through, unless it is somehow manipulated or a `retain` function is used.
5151

52-
HINT: As long as you embedd the fix functions in the Flux Workflow, you have to use double quotes to fence the fix functions,
53-
and single quotes in the fix functions. As we did here: `fix ("retain('title')")`
52+
HINT: As long as you embed the Fix functions in the Flux Workflow, you have to use double quotes to fence the Fix functions,
53+
and single quotes in the Fix functions. As we did here: `fix ("retain('title')")`
5454

5555
Now let us additionally keep the info that is given in the element `"publish_date"` and the subfield `"key"` in `'type'` by adding `'publish_date', 'type.key'` to `retain`:
5656

@@ -76,9 +76,9 @@ notes:
7676
7777
```
7878

79-
When manipulating data you often need to create many fixes to process a data file in the format and structure you need. With a text editor you can write all fix functions in a singe separate Fix file.
79+
When manipulating data you often need to create many Fixes to process a data file in the format and structure you need. With a text editor you can write all Fix functions in a singe separate Fix file.
8080

81-
The playground has an transformationFile-content area that can be used as if the Fix is in a separate file.
81+
The playground has a transformationFile-content area that can be used as if the Fix is in a separate file.
8282
In the playground we use the variable `transformationFile` to adress the Fix file in the playground.
8383

8484
Like this.
@@ -93,16 +93,16 @@ retain("title", "publish_date", "notes.value", "type.key")
9393

9494
Using a separate Fix file is recommended if you need to write many Fix functions. It will keep the Flux workflow clear and legible.
9595

96-
To add more fixes we can again edit the Fix file.
96+
To add more Fixes we can again edit the Fix file.
9797
Lets add these lines in front of the retain function:
9898

99-
```
99+
```perl
100100
move_field("type.key", "pub_type")
101101
```
102102

103103
Also change the `retain` function so that you keep the new element `"pub_type"` instead of the not existing nested `"key"` element.
104104

105-
```
105+
```perl
106106
move_field("type.key","pub_type")
107107
retain("title", "publish_date", "notes.value", "pub_type")
108108
```
@@ -121,7 +121,7 @@ notes:
121121
With `move_field` we moved and renamed an existing element.
122122
As next step add the following function before the `retain` function.
123123

124-
```
124+
```perl
125125
replace_all("pub_type","/type/","")
126126
```
127127

@@ -169,7 +169,7 @@ retain("title", "publish_date", "pub_type")
169169

170170
2) [Add a field with todays date called `"map_date"`.](https://metafacture.org/playground/?flux=%22https%3A//openlibrary.org/books/OL2838758M.json%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-json%0A%7C+fix+%28transformationFile%29%0A%7C+encode-yaml%0A%7C+print%0A%3B&transformation=move_field%28%22type.key%22%2C%22pub_type%22%29%0Areplace_all%28%22pub_type%22%2C%22/type/%22%2C%22%22%29%0A...%28%22mape_date%22%2C%22...%22%29%0Aretain%28%22title%22%2C+%22publish_date%22%2C+%22by_statement%22%2C+%22pub_type%22%29)
171171

172-
Have a look at the fix functions: https://metafacture.org/metafacture-documentation/docs/fix/Fix-functions.html (Hint: you could use `add_field` or `timestamp`. And don't forget to add the new element to `retain`)
172+
Have a look at the [Fix functions](https://metafacture.org/metafacture-documentation/docs/fix/Fix-functions.html). (Hint: you could use `add_field` or `timestamp`. And don't forget to add the new element to `retain`)
173173

174174

175175
<details>

0 commit comments

Comments
 (0)