Skip to content

Commit 2c16fab

Browse files
committed
Update dockerfile and documentation
1 parent 0c95c0d commit 2c16fab

File tree

6 files changed

+109
-106
lines changed

6 files changed

+109
-106
lines changed

docs/processors_catalogue/ngsi_ckan_sink.md

Lines changed: 64 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,25 @@
11
# NGSIToCKAN
22
Content:
33

4-
- [Functionality](#section1)
5-
- [Mapping NGSI-LD events to `NGSI-LDEvent` objects](#section1.1)
6-
- [Mapping `NGSI-LDEvents` to CKAN data structures](#section1.2)
7-
- [Organizations naming conventions](#section1.2.1)
8-
- [Package/dataset naming conventions](#section1.2.2)
9-
- [Resource naming conventions](#section1.2.3)
10-
- [Column-like storing](#section1.2.4)
11-
- [Example](#section1.3)
12-
- [NGSI-LDEvent](#section1.3.1)
13-
- [Organization, dataset and resource names](#section1.3.2)
14-
- [Column-like storing](#section1.3.3)
15-
- [Administration guide](#section2)
16-
- [Configuration](#section2.1)
17-
- [Use cases](#section2.2)
18-
- [Important notes](#section2.3)
19-
- [About the persistence mode](#section2.3.1)
20-
- [About the encoding](#section2.3.3)
21-
- [Programmers guide](#section3)
22-
- [`NGSICKANSink` class](#section3.1)
4+
- [Functionality](#section1)
5+
- [Mapping NGSI-LD events to `NGSI-LDEvent` objects](#section1.1)
6+
- [Mapping `NGSI-LDEvents` to CKAN data structures](#section1.2)
7+
- [Organizations naming conventions](#section1.2.1)
8+
- [Package/dataset naming conventions](#section1.2.2)
9+
- [Resource naming conventions](#section1.2.3)
10+
- [Column-like storing](#section1.2.4)
11+
- [Example](#section1.3)
12+
- [NGSI-LDEvent](#section1.3.1)
13+
- [Organization, dataset and resource names](#section1.3.2)
14+
- [Column-like storing](#section1.3.3)
15+
- [Administration guide](#section2)
16+
- [Configuration](#section2.1)
17+
- [Use cases](#section2.2)
18+
- [Important notes](#section2.3)
19+
- [About the persistence mode](#section2.3.1)
20+
- [About the encoding](#section2.3.3)
21+
- [Programmers guide](#section3)
22+
- [`NGSICKANSink` class](#section3.1)
2323

2424
## <a name="section1"></a>Functionality
2525
`NGSIToCKAN`, is a processor designed to persist NGSI-LD-like context data events within a [CKAN](http://ckan.org/) server.Usually, such a context data is notified by a
@@ -38,7 +38,7 @@ This is done at the Draco-ngsi Http listeners (in NiFi, processors) thanks to NG
3838
[Top](#top)
3939

4040
### <a name="section1.2"></a>Mapping `NGSI-LDEvent`s to CKAN data structures
41-
[CKAN ](http://docs.ckan.org/en/latest/user-guide.html) organizes the data in organizations containing packages or datasets; each one of these packages/datasets contains several resources whose data is finally stored in a PostgreSQL database (CKAN Datastore) or plain files (CKAN Filestore). Such organization is exploited by `NGSICKANSink` each time a `NGSI-LDEvent` is going to be persisted.
41+
[CKAN](http://docs.ckan.org/en/latest/user-guide.html) organizes the data in organizations containing packages or datasets; each one of these packages/datasets contains several resources whose data is finally stored in a PostgreSQL database (CKAN Datastore) or plain files (CKAN Filestore). Such organization is exploited by `NGSICKANSink` each time a `NGSI-LDEvent` is going to be persisted.
4242

4343
[Top](#top)
4444

@@ -49,7 +49,7 @@ https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTA
4949
Nevertheless, different than PostgreSQL, [organization lengths](http://docs.ckan.org/en/latest/api/#ckan.logic.action.create.organization_create) may be up to 100 characters (minimum, 2 characters).
5050

5151

52-
* Data model by entity id (`data_model=dm-by-entity-id`). The organization name will take the value of the notified header `fiware-service`. Note that in this case, encoding is never applied.
52+
- Data model by entity id (`data_model=dm-by-entity-id`). The organization name will take the value of the notified header `fiware-service`. Note that in this case, encoding is never applied.
5353

5454
The following table summarizes the organization name composition:
5555

@@ -60,12 +60,12 @@ The following table summarizes the organization name composition:
6060
[Top](#top)
6161

6262
#### <a name="section1.2.2"></a>Packages/datasets naming conventions
63-
* Data model by entity (`data_model=dm-by-entity`). A package/dataset named as the notified `fiware-service` header value (or, in absence of such header, the defaulted value for the FIWARE service ) is created (if not existing yet) in the above organization.
63+
- Data model by entity (`data_model=dm-by-entity`). A package/dataset named as the notified `fiware-service` header value (or, in absence of such header, the defaulted value for the FIWARE service ) is created (if not existing yet) in the above organization.
6464
Since based in [PostgreSQL only accepts](https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS), it must be said only alphanumeric characters and the underscore (`_`) are accepted. The hyphen ('-') is also accepted. This leads to certain [encoding](#section2.3.3) is applied depending on the `enable_encoding` configuration parameter
6565
Nevertheless, different than PostgreSQL, [dataset lengths](http://docs.ckan.org/en/latest/api/#ckan.logic.action.create.package_create) may be up to 100 characters (minimum, 2 characters).
6666

6767

68-
* Data model by entity id (`data_model=dm-by-entity-id`). A package/dataset name always take the entity ID. Such a name is already given in the NGSI-LDEvent values, see the [Configuration](#section2.1) section for more details) within the the `NGSI-LDEvent`. Note that in this case, encoding is never applied.
68+
- Data model by entity id (`data_model=dm-by-entity-id`). A package/dataset name always take the entity ID. Such a name is already given in the NGSI-LDEvent values, see the [Configuration](#section2.1) section for more details) within the the `NGSI-LDEvent`. Note that in this case, encoding is never applied.
6969

7070
The following table summarizes the package name composition:
7171

@@ -78,10 +78,10 @@ The following table summarizes the package name composition:
7878
#### <a name="section1.2.3"></a>Resources naming conventions
7979
The resource name depends on the configured data model (see the [Configuration](#section2.1) section for more details):
8080

81-
* Data model by entity (`data_model=dm-by-entity`). A resource name always take the concatenation of the entity ID and type. Such a name is already given in the `notified_entities`/`grouped_entities` header values (depending on using or not the grouping rules, see the [Configuration](#section2.1) section for more details) within the `NGSI-LDEvent`.
81+
- Data model by entity (`data_model=dm-by-entity`). A resource name always take the concatenation of the entity ID and type. Such a name is already given in the `notified_entities`/`grouped_entities` header values (depending on using or not the grouping rules, see the [Configuration](#section2.1) section for more details) within the `NGSI-LDEvent`.
8282

8383

84-
* Data model by entity id (`data_model=dm-by-entity-id`). A resource name always take the entity ID. Such a name is already given in the NGSI-LDEvent values, see the [Configuration](#section2.1) section for more details) within the the `NGSI-LDEvent`. Note that in this case, encoding is never applied.
84+
- Data model by entity id (`data_model=dm-by-entity-id`). A resource name always take the entity ID. Such a name is already given in the NGSI-LDEvent values, see the [Configuration](#section2.1) section for more details) within the the `NGSI-LDEvent`. Note that in this case, encoding is never applied.
8585

8686
It must be noticed a CKAN Datastore (and a viewer) is also created and associated to the resource above. This datastore, which in the end is a PostgreSQL table, will hold the persisted data.
8787

@@ -101,10 +101,10 @@ The following table summarizes the resource name composition:
101101
#### <a name="section1.2.3"></a>Column-like storing
102102
Regarding the specific data stored within the datastore associated to the resource, if `attr_persistence` parameter is set to `column` then a single line is composed for the whole notified entity, containing the following fields:
103103

104-
* `recvTime`: UTC timestamp in human-redable format ([ISO 8601](http://en.wikipedia.org/wiki/ISO_8601)).
105-
* `entityId`: Notified entity identifier.
106-
* `entityType`: Notified entity type.
107-
* For each notified property/relationship, a field named as the property/relationship is considered. This field will store the property/relationship values along the time, if no unique value is presented, the values will be stored like a JSON string.
104+
- `recvTimeTs` UTC timestamp in human-redable format ([ISO 8601](http://en.wikipedia.org/wiki/ISO_8601)).
105+
- `entityId`: Notified entity identifier.
106+
- `entityType`: Notified entity type.
107+
- For each notified property/relationship, a field named as the property/relationship is considered. This field will store the property/relationship values along the time, if no unique value is presented, the values will be stored like a JSON string.
108108

109109

110110
[Top](#top)
@@ -258,30 +258,30 @@ NOTE: `curl` is a Unix command allowing for interacting with REST APIs such as t
258258
### <a name="section2.1"></a>Configuration
259259
`NGSIToCKAN` is configured through the following parameters:
260260

261-
| Parameter | Mandatory | Default value | Comments |
262-
|---|---|---|---|
263-
| CKAN Host | no | localhost | FQDN/IP address where the CKAN server runs. ||
264-
| CKAN Port | no | 80 ||
265-
| CKAN Viewer | no | recline\_grid\_view | Please check the [available](http://docs.ckan.org/en/latest/maintaining/data-viewer.html) viewers at CKAN documentation. |
266-
| CKAN API Key | yes | N/A ||
267-
| ORION URL | yes | http://localhost:1026 | To be put as the filestore URL. |
268-
| SSL | no | false ||
269-
| NGSI Version | yes | ld | The NGSI version of the incoming notification could (currently only ngsi-ld available)|
270-
| Data Model | no | dm-by-entity | <i>dm-by-entity-id</i>, <i>dm-by-entity</i> |
271-
| Attribute Persistence | no | column | <i>column.</i>|
272-
| Default Service | no | test | The default Fiware service value for being used instead of the fiware-service header received for build the organization name |
273-
| Default Service Path| no | /path | The default Fiware service path value for being used instead of the fiware-service.path header received for build the package name (currently not used) |
274-
| Create DataStore | no | true | IF it is tru the DataStore is create and the data is stored in CKAN, otherwise teh Data store is not created and, in this way the Organization, package and dataset with the metadata is created associated with a link with the external resource |
275-
| batch\_size | no | 1 | Number of events accumulated before persistence. |
276-
| Enable Encoding | no | false | <i>true</i> or <i>false</i>, <i>true</i> applies the new encoding, <i>false</i> applies the old encoding. ||
277-
| Enable Lowercase | no | false | <i>true</i> or <i>false</i>. for applying lowercase to the name of organization, package dataset and resource||
278-
| Batch Size | no | 1 | Number of events accumulated before persistence. |
279-
| batch\_timeout | no | 30 | Number of seconds the batch will be building before it is persisted as it is. |
280-
| batch\_ttl | no | 10 | Number of retries when a batch cannot be persisted. Use `0` for no retries, `-1` for infinite retries. Please, consider an infinite TTL (even a very large one) may consume all the sink's channel capacity very quickly. |
281-
| batch\_retry\_intervals | no | 5000 | Comma-separated list of intervals (in miliseconds) at which the retries regarding not persisted batches will be done. First retry will be done as many miliseconds after as the first value, then the second retry will be done as many miliseconds after as second value, and so on. If the batch\_ttl is greater than the number of intervals, the last interval is repeated. |
282-
| Max Connections | no | 500 | Maximum number of connections allowed for a Http-based HDFS backend. |
283-
| Max Connections per route | no | 100 | Maximum number of connections per route allowed for a Http-based HDFS backend. |
284-
| Rollback on failure| false | false | Do a rollback in case of failure |
261+
| Parameter | Mandatory | Default value | Comments |
262+
|---|---|---------------------------|---|
263+
| CKAN Host | no | localhost | FQDN/IP address where the CKAN server runs. ||
264+
| CKAN Port | no | 80 ||
265+
| CKAN Viewer | no | recline\_grid\_view | Please check the [available](http://docs.ckan.org/en/latest/maintaining/data-viewer.html) viewers at CKAN documentation. |
266+
| CKAN API Key | yes | N/A ||
267+
| ORION URL | yes | [http://localhost:1026](http://localhost:1026) | To be put as the filestore URL. |
268+
| SSL | no | false ||
269+
| NGSI Version | yes | ld | The NGSI version of the incoming notification could (currently only ngsi-ld available)|
270+
| Data Model | no | dm-by-entity | <i>dm-by-entity-id</i>, <i>dm-by-entity</i> |
271+
| Attribute Persistence | no | column | <i>column.</i>|
272+
| Default Service | no | test | The default Fiware service value for being used instead of the fiware-service header received for build the organization name |
273+
| Default Service Path| no | /path | The default Fiware service path value for being used instead of the fiware-service.path header received for build the package name (currently not used) |
274+
| Create DataStore | no | true | IF it is tru the DataStore is create and the data is stored in CKAN, otherwise teh Data store is not created and, in this way the Organization, package and dataset with the metadata is created associated with a link with the external resource |
275+
| batch\_size | no | 1 | Number of events accumulated before persistence. |
276+
| Enable Encoding | no | false | <i>true</i> or <i>false</i>, <i>true</i> applies the new encoding, <i>false</i> applies the old encoding. ||
277+
| Enable Lowercase | no | false | <i>true</i> or <i>false</i>. for applying lowercase to the name of organization, package dataset and resource||
278+
| Batch Size | no | 1 | Number of events accumulated before persistence. |
279+
| batch\_timeout | no | 30 | Number of seconds the batch will be building before it is persisted as it is. |
280+
| batch\_ttl | no | 10 | Number of retries when a batch cannot be persisted. Use `0` for no retries, `-1` for infinite retries. Please, consider an infinite TTL (even a very large one) may consume all the sink's channel capacity very quickly. |
281+
| batch\_retry\_intervals | no | 5000 | Comma-separated list of intervals (in miliseconds) at which the retries regarding not persisted batches will be done. First retry will be done as many miliseconds after as the first value, then the second retry will be done as many miliseconds after as second value, and so on. If the batch\_ttl is greater than the number of intervals, the last interval is repeated. |
282+
| Max Connections | no | 500 | Maximum number of connections allowed for a Http-based HDFS backend. |
283+
| Max Connections per route | no | 100 | Maximum number of connections per route allowed for a Http-based HDFS backend. |
284+
| Rollback on failure| false | false | Do a rollback in case of failure |
285285

286286
A configuration example could be:
287287
![NGSIToCKAN configuration example](../images/processor-ckan.png)
@@ -310,21 +310,21 @@ By default, `NGSIToCKAN` has a configured batch size and batch accumulation time
310310
#### <a name="section2.3.3"></a>About the encoding
311311
Until version 1.2.0 (included), Draco applied a very simple encoding:
312312

313-
* All non alphanumeric characters were replaced by underscore, `_`.
314-
* The underscore was used as concatenator character as well.
313+
- All non alphanumeric characters were replaced by underscore, `_`.
314+
- The underscore was used as concatenator character as well.
315315

316316

317317
From version 1.3.0 (included), Draco applies this specific encoding tailored to CKAN data structures:
318318

319-
* Lowercase alphanumeric characters are not encoded.
320-
* Upercase alphanumeric characters are encoded.
321-
* Numeric characters are not encoded.
322-
* Underscore character, `_`, is not encoded.
323-
* Hyphen character, `-`, is not encoded.
324-
* Equals character, `=`, is encoded as `xffff`.
325-
* All other characters, including the slash in the FIWARE service paths, are encoded as a `x` character followed by the [Unicode](http://unicode-table.com) of the character.
326-
* User defined strings composed of a `x` character and a Unicode are encoded as `xx` followed by the Unicode.
327-
* `xffff` is used as concatenator character.
319+
- Lowercase alphanumeric characters are not encoded.
320+
- Upercase alphanumeric characters are encoded.
321+
- Numeric characters are not encoded.
322+
- Underscore character, `_`, is not encoded.
323+
- Hyphen character, `-`, is not encoded.
324+
- Equals character, `=`, is encoded as `xffff`.
325+
- All other characters, including the slash in the FIWARE service paths, are encoded as a `x` character followed by the [Unicode](http://unicode-table.com) of the character.
326+
- User defined strings composed of a `x` character and a Unicode are encoded as `xx` followed by the Unicode.
327+
- `xffff` is used as concatenator character.
328328

329329
Despite the old encoding will be deprecated in the future, it is possible to switch the encoding type through the `enable_encoding` parameter as explained in the [configuration](#section2.1) section.
330330

docs/processors_catalogue/upadate_ckan_metadata.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
# UpdateCKANMetadata
22
Content:
33

4-
- [Functionality](#section1)
5-
- [Mapping NGSI-LD events to `NGSI-LDEvent` objects](#section1.1)
6-
- [Mapping `NGSI-LDEvents` to DCAT-AP Metadata into CKAN data structures](#section1.2)
7-
- [DCAT-AP Metadata tags for Organizations](#section1.2.1)
8-
- [DCAT-AP Metadata tags for Packages/Datasets](#section1.2.2)
9-
- [DCAT-AP Metadata tags for Resources](#section1.2.3)
10-
- [Administration guide](#section2)
11-
- [Configuration](#section2.1)
12-
- [Use cases](#section2.2)
13-
- [Programmers guide](#section3)
14-
- [`UpdateCKANMetadata` class](#section3.1)
4+
- [Functionality](#section1)
5+
- [Mapping NGSI-LD events to `NGSI-LDEvent` objects](#section1.1)
6+
- [Mapping `NGSI-LDEvents` to DCAT-AP Metadata into CKAN data structures](#section1.2)
7+
- [DCAT-AP Metadata tags for Organizations](#section1.2.1)
8+
- [DCAT-AP Metadata tags for Packages/Datasets](#section1.2.2)
9+
- [DCAT-AP Metadata tags for Resources](#section1.2.3)
10+
- [Administration guide](#section2)
11+
- [Configuration](#section2.1)
12+
- [Use cases](#section2.2)
13+
- [Programmers guide](#section3)
14+
- [`UpdateCKANMetadata` class](#section3.1)
1515

1616
## <a name="section1"></a>Functionality
1717
`UpdateCKANMetadata`, is a processor designed to add additional metadata to an incoming flowfile or NGSI-LD event. This processor was designed to include all the metadata fields needed for complying with [DCAT-AP v2.0.1](https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/news/dcat-ap-release-201)

docs/quick_start_guide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,8 @@ mysql latest 273a1eca2d3a 2 weeks ago
6262
(2) Once you have your containers up and running, you can add the template provided for persisting data to MySQL.
6363

6464
First, go to your browser and open Draco using this URL `https://localhost:9090/nifi/` using the following credentials in the login page:
65-
- `user: admin`
66-
- `password: pass1234567890`
65+
- `user: admin`
66+
- `password: pass1234567890`
6767

6868
The next image provides you the location of many components of Draco. Please put special attention to the template
6969
button, play button and processor component, you will use them later.

0 commit comments

Comments
 (0)