|
| 1 | +# MariaDB Batch Source |
| 2 | + |
| 3 | + |
| 4 | +Description |
| 5 | +----------- |
| 6 | +Reads from a MariaDB instance using a configurable SQL query. |
| 7 | +Outputs one record for each row returned by the query. |
| 8 | + |
| 9 | + |
| 10 | +Use Case |
| 11 | +-------- |
| 12 | +The source is used whenever you need to read from a MariaDB instance. For example, you may want |
| 13 | +to create daily snapshots of a database table by using this source and writing to |
| 14 | +a TimePartitionedFileSet. |
| 15 | + |
| 16 | + |
| 17 | +Properties |
| 18 | +---------- |
| 19 | +**Reference Name:** Name used to uniquely identify this source for lineage, annotating metadata, etc. |
| 20 | + |
| 21 | +**Driver Name:** Name of the JDBC driver to use. |
| 22 | + |
| 23 | +**Host:** Host that MariaDB is running on. |
| 24 | + |
| 25 | +**Port:** Port that MariaDB is running on. |
| 26 | + |
| 27 | +**Database:** MariaDB database name. |
| 28 | + |
| 29 | +**Import Query:** The SELECT query to use to import data from the specified table. |
| 30 | +You can specify an arbitrary number of columns to import, or import all columns using \*. The Query should |
| 31 | +contain the '$CONDITIONS' string. For example, 'SELECT * FROM table WHERE $CONDITIONS'. |
| 32 | +The '$CONDITIONS' string will be replaced by 'splitBy' field limits specified by the bounding query. |
| 33 | +The '$CONDITIONS' string is not required if numSplits is set to one. |
| 34 | + |
| 35 | +**Bounding Query:** Bounding Query should return the min and max of the values of the 'splitBy' field. |
| 36 | +For example, 'SELECT MIN(id),MAX(id) FROM table'. Not required if numSplits is set to one. |
| 37 | + |
| 38 | +**Split-By Field Name:** Field Name which will be used to generate splits. Not required if numSplits is set to one. |
| 39 | + |
| 40 | +**Number of Splits to Generate:** Number of splits to generate. |
| 41 | + |
| 42 | +**Username:** User identity for connecting to the specified database. |
| 43 | + |
| 44 | +**Password:** Password to use to connect to the specified database. |
| 45 | + |
| 46 | +**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments |
| 47 | +will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations. |
| 48 | + |
| 49 | +**Auto Reconnect:** Should the driver try to re-establish stale and/or dead connections. |
| 50 | + |
| 51 | +**Schema:** The schema of records output by the source. This will be used in place of whatever schema comes |
| 52 | +back from the query. However, it must match the schema that comes back from the query, |
| 53 | +except it can mark fields as nullable and can contain a subset of the fields. |
| 54 | + |
| 55 | +**Use SSL:** Turns on SSL encryption. The connection will fail if SSL is not available. |
| 56 | + |
| 57 | +**Keystore URL:** URL to the client certificate KeyStore (if not specified, use defaults). Must be accessible at the |
| 58 | +same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon |
| 59 | +role is running. |
| 60 | + |
| 61 | +**Keystore Password:** Password for the client certificates KeyStore. |
| 62 | + |
| 63 | +**Truststore URL:** URL to the trusted root certificate KeyStore (if not specified, use defaults). Must be accessible at |
| 64 | +the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN |
| 65 | +daemon role is running. |
| 66 | + |
| 67 | +**Truststore Password:** Password for the trusted root certificates KeyStore |
| 68 | + |
| 69 | +**Use Compression:** Use zlib compression when communicating with the server. Select this option for WAN |
| 70 | +connections. |
| 71 | + |
| 72 | +**Use ANSI Quotes:** Treats " as an identifier quote character and not as a string quote character. |
| 73 | + |
| 74 | +**SQL_MODE:** Override the default SQL_MODE session variable used by the server. |
| 75 | + |
| 76 | + |
| 77 | +Data Types Mapping |
| 78 | +---------- |
| 79 | + |
| 80 | + +--------------------------------+-----------------------+------------------------------------+ |
| 81 | + | MariaDB Data Type | CDAP Schema Data Type | Comment | |
| 82 | + +--------------------------------+-----------------------+------------------------------------+ |
| 83 | + | TINYINT | int | | |
| 84 | + | BOOLEAN, BOOL | boolean | | |
| 85 | + | SMALLINT | int | | |
| 86 | + | MEDIUMINT | int | | |
| 87 | + | INT, INTEGER | int | | |
| 88 | + | BIGINT | long | | |
| 89 | + | DECIMAL, DEC, NUMERIC, FIXED | decimal | | |
| 90 | + | FLOAT | float | | |
| 91 | + | DOUBLE, DOUBLE PRECISION, REAL | decimal | | |
| 92 | + | BIT | boolean | | |
| 93 | + | CHAR | string | | |
| 94 | + | VARCHAR | string | | |
| 95 | + | BINARY | bytes | | |
| 96 | + | CHAR BYTE | bytes | | |
| 97 | + | VARBINARY | bytes | | |
| 98 | + | TINYBLOB | bytes | | |
| 99 | + | BLOB | bytes | | |
| 100 | + | MEDIUMBLOB | bytes | | |
| 101 | + | LONGBLOB | bytes | | |
| 102 | + | TINYTEXT | string | | |
| 103 | + | TEXT | string | | |
| 104 | + | MEDIUMTEXT | string | | |
| 105 | + | LONGTEXT | string | | |
| 106 | + | JSON | string | In MariaDB it is alias to LONGTEXT | |
| 107 | + | ENUM | string | Mapping to String by default | |
| 108 | + | SET | string | | |
| 109 | + | DATE | date | | |
| 110 | + | TIME | time_micros | | |
| 111 | + | DATETIME | timestamp_micros | | |
| 112 | + | TIMESTAMP | timestamp_micros | | |
| 113 | + | YEAR | date | | |
| 114 | + +--------------------------------+-----------------------+------------------------------------+ |
| 115 | + |
| 116 | + |
| 117 | +Example |
| 118 | +------ |
| 119 | +Suppose you want to read data from MariaDB database named "prod" that is running on "localhost" port 3306, |
| 120 | +as "root" user with "root" password, then configure plugin with: |
| 121 | + |
| 122 | + |
| 123 | +``` |
| 124 | +Reference Name: "src1" |
| 125 | +Driver Name: "mariadb" |
| 126 | +Host: "localhost" |
| 127 | +Port: 3306 |
| 128 | +Database: "prod" |
| 129 | +Import Query: "select id, name, email, phone from users;" |
| 130 | +Number of Splits to Generate: 1 |
| 131 | +Username: "root" |
| 132 | +Password: "root" |
| 133 | +``` |
| 134 | + |
| 135 | +For example, if the 'id' column is a primary key of type int and the other columns are |
| 136 | +non-nullable varchars, output records will have this schema: |
| 137 | + |
| 138 | + +----------------+---------------------+ |
| 139 | + | Field Name | Type | |
| 140 | + +----------------+---------------------+ |
| 141 | + | id | int | |
| 142 | + | name | string | |
| 143 | + | email | string | |
| 144 | + | phone | string | |
| 145 | + +----------------+---------------------+ |
0 commit comments