Skip to content

Commit ee18306

Browse files
authored
Merge pull request #59 from akhalymon/PLUGIN-72
PLUGIN-72 SAP Hana database Batch source, sink, action and post-action plugins
2 parents 00215ac + 3d70b56 commit ee18306

25 files changed

+1247
-2
lines changed

README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,20 @@ mvn clean test \
1313
```
1414
Notice that you must change properties for Aurora MySQL and Aurora Postgresql to real before running tests.
1515
## Setup Local Environment
16-
MySQL, Postgresql, MSSQL, DB2, MemSQL are using prebuild images.
16+
MySQL, Postgresql, MSSQL, DB2, MemSQL, SAP HANA are using prebuild images.
1717

1818
Oracle DB image should be build separately.
1919

2020
MemSQL image should be configure after start.
2121

22+
Note that you should login into docker account to pull SAP HANA image.
23+
Account can be created [here](https://hub.docker.com/signup)
24+
Also, please note SAP HANA is sensitive to some CPU instructions.
25+
CPU model "host-passthrough" or similar can be required if running inside VM.
26+
SAP HANA requires that password for DB is provided through url.
27+
Convenience script ```docker-compose/db-plugins-env/saphana-password-server.sh```
28+
provided for this purpose.
29+
2230
Netezza requires VMware Player for running Netezza emulator.
2331

2432
* [Install Docker Compose](https://docs.docker.com/compose/install/)
@@ -36,8 +44,13 @@ export MEMSQL_LICENSE_KEY=YOUR_LICENSE_KEY
3644
```bash
3745
docker-compose up memsql
3846
```
47+
* Start SAP HANA password service
48+
```bash
49+
bash saphana-password-server.sh &
50+
```
3951
* Start docker environment by running commands:
4052
```bash
53+
cd docker-compose/db-plugins-env/
4154
docker-compose up -d
4255
```
4356
* Connect to MemSQL Studio at [http://localhost:8888](http://localhost:8888)

docker-compose/db-plugins-env/docker-compose.yml

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,10 +57,32 @@ services:
5757
- ORACLE_SID=cdap
5858
- ORACLE_PDB=mydb
5959
- ORACLE_PWD=123Qwe123
60+
6061
memsql:
6162
image: memsql/cluster-in-a-box:centos-6.8.10-a53e479edc-1.9.0-1.3.0
6263
ports:
6364
- 3308:3306
6465
- 8888:8080
6566
environment:
66-
- LICENSE_KEY=${MEMSQL_LICENSE_KEY}
67+
- LICENSE_KEY=${MEMSQL_LICENSE_KEY}
68+
69+
saphana:
70+
image: store/saplabs/hanaexpress:2.00.040.00.20190729.1
71+
hostname: hxehost
72+
ports:
73+
- 39017:39017
74+
- 39013:39013
75+
ulimits:
76+
nproc: 65535
77+
nofile:
78+
soft: 1048576
79+
hard: 1048576
80+
sysctls:
81+
- kernel.shmmax=1073741824
82+
- net.ipv4.ip_local_port_range=60000 65535
83+
- kernel.shmmni=524288
84+
- kernel.shmall=8388608
85+
extra_hosts:
86+
# Alter this if running on non-Linux machine
87+
- "host:172.17.0.1"
88+
command: --agree-to-sap-license --passwords-url http://host:1500
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/bash
2+
# Copyright © 2019 Cask Data, Inc.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
5+
# use this file except in compliance with the License. You may obtain a copy of
6+
# the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
12+
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
13+
# License for the specific language governing permissions and limitations under
14+
# the License.
15+
16+
17+
# Tested with Ubuntu 18.04
18+
echo -e "HTTP/1.1 200 OK\n\n {\n \"master_password\" : \"SAPhxe123\"\n } " | nc -q 1 -l 0.0.0.0 1500

pom.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
<module>aurora-mysql-plugin</module>
3939
<module>aurora-postgresql-plugin</module>
4040
<module>memsql-plugin</module>
41+
<module>saphana-plugin</module>
4142
</modules>
4243

4344
<licenses>

saphana-plugin/docs/SapHana-action.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# SAP HANA Action
2+
3+
4+
Description
5+
-----------
6+
Action that runs a SAP HANA command.
7+
8+
9+
Use Case
10+
--------
11+
The action can be used whenever you want to run a SAP HANA command before or after a data pipeline.
12+
For example, you may want to run a sql update command on a database before the pipeline source pulls data from tables.
13+
14+
15+
Properties
16+
----------
17+
**Driver Name:** Name of the JDBC driver to use.
18+
19+
**Database Command:** Database command to execute.
20+
21+
**Host:** Host that SAP HANA is running on.
22+
23+
**Port:** Port that SAP HANA is running on.
24+
25+
**Database:** SAP HANA database name.
26+
27+
**Username:** User identity for connecting to the specified database.
28+
29+
**Password:** Password to use to connect to the specified database.
30+
31+
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
32+
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# SAP HANA Batch Sink
2+
3+
4+
Description
5+
-----------
6+
7+
This sink is used whenever you need to write to a SAP HANA table.
8+
Suppose you periodically build a recommendation model for products on your online store.
9+
The model is stored in a FileSet and you want to export the contents
10+
of the FileSet to a SAP HANA table where it can be served to your users.
11+
12+
Column names would be autodetected from input schema.
13+
14+
15+
Use Case
16+
--------
17+
This sink is used whenever you need to write to a SAP HANA table.
18+
Suppose you periodically build a recommendation model for products on your online store.
19+
The model is stored in a FileSet and you want to export the contents
20+
of the FileSet to a SAP HANA table where it can be served to your users.
21+
22+
Column names would be autodetected from input schema.
23+
24+
25+
26+
Properties
27+
----------
28+
29+
**Reference Name:** Name used to uniquely identify this sink for lineage, annotating metadata, etc.
30+
31+
**Driver Name:** Name of the JDBC driver to use.
32+
33+
**Table Name:** Name of the table to export to.
34+
35+
**Host:** Host that SAP HANA is running on.
36+
37+
**Port:** Port that SAP HANA is running on.
38+
39+
**Database:** SAP HANA database name.
40+
41+
**Username:** User identity for connecting to the specified database.
42+
43+
**Password:** Password to use to connect to the specified database.
44+
45+
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
46+
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# SAP HANA Batch Source
2+
3+
4+
Description
5+
-----------
6+
7+
Reads from a SAP HANA using a configurable SQL query.
8+
Outputs one record for each row returned by the query.
9+
10+
Use Case
11+
--------
12+
The source is used whenever you need to read from a SAP HANA. For example, you may want
13+
to create daily snapshots of a database table by using this source and writing to
14+
a TimePartitionedFileSet.
15+
16+
17+
18+
Properties
19+
----------
20+
**Reference Name:** Name used to uniquely identify this source for lineage, annotating metadata, etc.
21+
22+
**Driver Name:** Name of the JDBC driver to use.
23+
24+
**Host:** Host that SAP HANA is running on.
25+
26+
**Port:** Port that SAP HANA is running on.
27+
28+
**Database:** SAP HANA database name.
29+
30+
**Import Query:** The SELECT query to use to import data from the specified table.
31+
You can specify an arbitrary number of columns to import, or import all columns using \*. The Query should
32+
contain the '$CONDITIONS' string. For example, 'SELECT * FROM table WHERE $CONDITIONS'.
33+
The '$CONDITIONS' string will be replaced by 'splitBy' field limits specified by the bounding query.
34+
The '$CONDITIONS' string is not required if numSplits is set to one.
35+
36+
**Bounding Query:** Bounding Query should return the min and max of the values of the 'splitBy' field.
37+
For example, 'SELECT MIN(id),MAX(id) FROM table'. Not required if numSplits is set to one.
38+
39+
**Split-By Field Name:** Field Name which will be used to generate splits. Not required if numSplits is set to one.
40+
41+
**Number of Splits to Generate:** Number of splits to generate.
42+
43+
**Username:** User identity for connecting to the specified database.
44+
45+
**Password:** Password to use to connect to the specified database.
46+
47+
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
48+
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.
49+
50+
**Schema:** The schema of records output by the source. This will be used in place of whatever schema comes
51+
back from the query. However, it must match the schema that comes back from the query,
52+
except it can mark fields as nullable and can contain a subset of the fields.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# SAP HANA Query Post-run Action
2+
3+
4+
Description
5+
-----------
6+
Runs a SAP HANA query at the end of the pipeline run.
7+
Can be configured to run only on success, only on failure, or always at the end of the run.
8+
9+
Use Case
10+
--------
11+
The action is used whenever you need to run a query at the end of a pipeline run.
12+
For example, you may have a pipeline that imports data from a database table to
13+
hdfs files. At the end of the run, you may want to run a query that deletes the data
14+
that was read from the table.
15+
16+
17+
Properties
18+
----------
19+
**Run Condition:** When to run the action. Must be 'completion', 'success', or 'failure'. Defaults to 'success'.
20+
If set to 'completion', the action will be executed regardless of whether the pipeline run succeeded or failed.
21+
If set to 'success', the action will only be executed if the pipeline run succeeded.
22+
If set to 'failure', the action will only be executed if the pipeline run failed.
23+
24+
**Driver Name:** Name of the JDBC driver to use.
25+
26+
**Query:** Query to run.
27+
28+
**Host:** Host that SAP HANA is running on.
29+
30+
**Port:** Port that SAP HANA is running on.
31+
32+
**Database:** SAP HANA database name.
33+
34+
**Username:** User identity for connecting to the specified database.
35+
36+
**Password:** Password to use to connect to the specified database.
37+
38+
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
39+
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.
9.95 KB
Loading
9.95 KB
Loading

0 commit comments

Comments
 (0)