You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
External functions in Databend allow you to define custom operations for processing data. These functions are implemented using an external server in programming languages such as Python. They represent an advanced form of custom operation, relying on an external server to define and execute custom data processing operations. Key features of external functions include:
6
-
7
-
- Scalability: External functions are well-suited for handling complex and resource-intensive data operations, making them suitable for demanding processing tasks.
6
+
# External Functions in Databend Cloud
8
7
9
-
-External Libraries: They can utilize external libraries and dependencies, enhancing their capabilities and versatility by integrating additional functionality.
8
+
External functions in Databend allow you to define custom operations for processing data using external servers written in programming languages like Python. These functions enable you to extend Databend's capabilities by integrating custom logic, leveraging external libraries, and handling complex processing tasks. Key features of external functions include:
10
9
11
-
- Advanced Logic: External functions can implement advanced and sophisticated data processing logic, making them ideal for complex data processing scenarios.
10
+
-**Scalability**: Ideal for complex and resource-intensive data operations.
11
+
-**External Libraries**: Leverage additional functionality through external libraries and dependencies.
12
+
-**Advanced Logic**: Implement sophisticated data processing logic for complex scenarios.
12
13
13
14
## Supported Programming Languages
14
15
15
-
This table lists the supported languages and the required libraries for creating external functions in Databend:
16
+
The following table lists the supported languages and the required libraries for creating external functions in Databend:
Databend provides a variety of commands to manage external functions. For details, see [External Function](/sql/sql-commands/ddl/external-function/).
24
-
25
-
## Databend Settings for External Functions Server
26
-
27
-
Databend provides the following settings to configure the external function server communication:
24
+
You can manage external functions using SQL commands such as `CREATE FUNCTION`, `DROP FUNCTION`, and `SHOW FUNCTIONS`. For more details, see [External Function](/sql/sql-commands/ddl/external-function/).
28
25
29
-
| Setting Name | Default Value | Description | Range |
|`external_server_connect_timeout_secs`| 10 | Connection timeout to external server | 0 to u64::MAX |
32
-
|`external_server_request_timeout_secs`| 180 | Request timeout to external server | 0 to u64::MAX |
33
-
|`external_server_request_batch_rows`| 65536 | Request batch rows to external server | 1 to u64::MAX |
26
+
## Configuring External Functions in Databend Cloud
34
27
28
+
To use external functions in Databend Cloud, you need to **allowlist the addresses of your external function servers**. The external function server must be accessible via a domain name over HTTPS. Please contact Databend Cloud support to add your allowed UDF server addresses:
35
29
36
-
## Usage Examples
30
+
1. Navigate to **Support** > **Create New Ticket** in the Databend Cloud console.
31
+
2. Provide the external server addresses (with HTTPS domain names) you wish to allowlist.
32
+
3. Submit the ticket and await confirmation from the support team.
37
33
38
-
This section demonstrates how to create an external function in each of the [Supported Programming Languages](#supported-programming-languages).
34
+
## Usage Example: Creating an External Function in Python
39
35
40
-
### Creating an External Function in Python
36
+
This section demonstrates how to create an external function using Python.
41
37
42
-
1. Before starting Databend, add the following parameters to the [query] section in your [databend-query.toml](https://github.com/datafuselabs/databend/blob/main/scripts/distribution/configs/databend-query.toml) configuration file.
43
-
44
-
:::note
45
-
If you are on Databend Cloud, skip this step and contact us with your allowed UDF server addresses by creating a ticket on **Support** > **Create New Ticket**.
46
-
:::
47
-
48
-
```toml title='databend-query.toml'
49
-
[query]
50
-
...
51
-
enable_udf_server = true
52
-
# List the allowed server addresses, separating multiple addresses with commas.
53
-
# For example, ['http://0.0.0.0:8815', 'http://example.com']
54
-
udf_server_allow_list = ['http://0.0.0.0:8815']
55
-
...
56
-
```
38
+
### 1. Install the Required Library
57
39
58
-
2.Install [databend-udf](https://pypi.org/project/databend-udf) using pip. If you haven't installed pip, you can download and install it following the official documentation: [Installing pip](https://pip.pypa.io/en/stable/installation/).
40
+
Install the [databend-udf](https://pypi.org/project/databend-udf)library using `pip`:
59
41
60
42
```bash
61
43
pip install databend-udf
62
44
```
63
45
64
-
3. Define your function. This code defines and runs an external server in Python, which exposes a custom function *gcd* for calculating the greatest common divisor of two integers and allows remote execution of this function:
46
+
### 2. Define Your Function
65
47
66
-
```python title='external_function.py'
67
-
from databend_udf import*
48
+
Create a Python file (e.g., `external_function.py`) and define your external function. The following example defines an external server in Python that exposes a custom function `gcd` for calculating the greatest common divisor of two integers:
49
+
50
+
```python
51
+
from databend_udf import udf, UDFServer
68
52
69
53
@udf(
70
54
input_types=["INT", "INT"],
@@ -73,60 +57,92 @@ from databend_udf import *
73
57
)
74
58
defgcd(x: int, y: int) -> int:
75
59
while y !=0:
76
-
(x, y)=(y, x % y)
60
+
x, y = y, x % y
77
61
return x
78
62
79
63
if__name__=='__main__':
80
-
#create an external server listening at '0.0.0.0:8815'
64
+
#Create an external server listening at '0.0.0.0:8815'
81
65
server = UDFServer("0.0.0.0:8815")
82
-
#add defined functions
66
+
#Add the defined function
83
67
server.add_function(gcd)
84
-
#start the external server
68
+
#Start the external server
85
69
server.serve()
86
70
```
87
71
88
-
`@udf`is a decorator used for defining external functions in Databend, supporting the following parameters:
| input_types | A list of strings or Arrow data types that specify the input data types.|
93
-
| result_type | A string or an Arrow data type that specifies the return value type.|
94
-
| name |An optional string specifying the function name. If not provided, the original name will be used. |
95
-
| io_threads | Number of I/O threads used per data chunk for I/Obound functions. |
96
-
| skip_null |A boolean value specifying whether to skip NULL values. If set to True, NULL values will not be passed to the function, and the corresponding return value is set to NULL. Default is False.|
|`input_types`| A list of strings specifying the input data types (e.g., `["INT", "VARCHAR"]`). |
77
+
|`result_type`| A string specifying the return value type (e.g., `"INT"`). |
78
+
|`name`|(Optional) Custom name for the function. If not provided, the original function name is used.|
79
+
|`io_threads`| Number of I/O threads used per data chunk for I/O-bound functions.|
80
+
|`skip_null`| If set to `True`, NULL values are not passed to the function, and the corresponding return value is set to NULL. Default is `False`. |
97
81
98
-
This table illustrates the correspondence between Databend data types and their corresponding Python equivalents:
82
+
**Data Type Mappings Between Databend and Python:**
99
83
100
84
| Databend Type | Python Type |
101
-
|-----------------------|-----------------------|
102
-
| BOOLEAN | bool |
103
-
| TINYINT (UNSIGNED) | int |
104
-
| SMALLINT (UNSIGNED) | int |
105
-
| INT (UNSIGNED) | int |
106
-
| BIGINT (UNSIGNED) | int |
107
-
| FLOAT | float |
108
-
| DOUBLE | float |
109
-
| DECIMAL | decimal.Decimal |
110
-
| DATE | datetime.date |
111
-
| TIMESTAMP | datetime.datetime |
112
-
| VARCHAR | str |
113
-
| VARIANT | any |
114
-
| MAP(K,V) | dict |
115
-
| ARRAY(T) | list[T]|
116
-
| TUPLE(T...) | tuple(T...) |
117
-
118
-
4. Run the Python file to start the external server:
119
-
120
-
```shell
85
+
|-----------------------|----------------------|
86
+
| BOOLEAN |`bool`|
87
+
| TINYINT (UNSIGNED) |`int`|
88
+
| SMALLINT (UNSIGNED) |`int`|
89
+
| INT (UNSIGNED) |`int`|
90
+
| BIGINT (UNSIGNED) |`int`|
91
+
| FLOAT |`float`|
92
+
| DOUBLE |`float`|
93
+
| DECIMAL |`decimal.Decimal`|
94
+
| DATE |`datetime.date`|
95
+
| TIMESTAMP |`datetime.datetime`|
96
+
| VARCHAR |`str`|
97
+
| VARIANT |`any`|
98
+
| MAP(K,V) |`dict`|
99
+
| ARRAY(T) |`list[T]`|
100
+
| TUPLE(T,...) |`tuple(T,...)`|
101
+
102
+
### 3. Run the External Server
103
+
104
+
Run the Python file to start the external server:
105
+
106
+
```bash
121
107
python3 external_function.py
122
108
```
123
109
124
-
5. Register the function *gcd* with the [CREATE FUNCTION](/sql/sql-commands/ddl/external-function/) in Databend:
110
+
**Note:** Ensure that the server is accessible from Databend Cloud and that the address is allowlisted. If not already done, contact Databend Cloud support to add the server address to the allowlist.
111
+
112
+
### 4. Register the Function in Databend Cloud
113
+
114
+
Register the function `gcd` in Databend using the `CREATE FUNCTION` statement:
125
115
126
116
```sql
127
117
CREATEFUNCTIONgcd (INT, INT)
128
118
RETURNS INT
129
-
LANGUAGE python
130
-
HANDLER ='gcd'
131
-
ADDRESS ='http://0.0.0.0:8815';
132
-
```
119
+
LANGUAGE PYTHON
120
+
HANDLER ='gcd'
121
+
ADDRESS ='<your-allowed-server-address>';
122
+
```
123
+
124
+
- Replace `<your-allowed-server-address>` with the actual address of your external server that has been allowlisted in Databend Cloud (must be an HTTPS domain).
125
+
- The `HANDLER` specifies the name of the function as defined in your Python code.
126
+
- The `ADDRESS` should match the address where your external server is running and must be allowlisted by Databend Cloud.
127
+
128
+
**Example:**
129
+
130
+
```sql
131
+
CREATEFUNCTIONgcd (INT, INT)
132
+
RETURNS INT
133
+
LANGUAGE PYTHON
134
+
HANDLER ='gcd'
135
+
ADDRESS ='https://your-server-address';
136
+
```
137
+
138
+
**Important:** Before executing this statement, ensure that `'https://your-server-address'` is allowlisted in Databend Cloud by contacting support.
139
+
140
+
You can now use the external function `gcd` in your SQL queries:
141
+
142
+
```sql
143
+
SELECT gcd(48, 18); -- Returns 6
144
+
```
145
+
146
+
## Conclusion
147
+
148
+
External functions in Databend Cloud provide a powerful way to extend the functionality of your data processing pipelines by integrating custom code written in languages like Python. By following the steps outlined above, you can create and use external functions to handle complex processing tasks, leverage external libraries, and implement advanced logic.
0 commit comments