Skip to content

Commit b2f16f3

Browse files
authored
update external function prefer cloud
1 parent d4b514f commit b2f16f3

File tree

1 file changed

+93
-77
lines changed

1 file changed

+93
-77
lines changed
Lines changed: 93 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1,70 +1,54 @@
1+
12
---
2-
title: External Function
3+
title: External Functions in Databend Cloud
34
---
45

5-
External functions in Databend allow you to define custom operations for processing data. These functions are implemented using an external server in programming languages such as Python. They represent an advanced form of custom operation, relying on an external server to define and execute custom data processing operations. Key features of external functions include:
6-
7-
- Scalability: External functions are well-suited for handling complex and resource-intensive data operations, making them suitable for demanding processing tasks.
6+
# External Functions in Databend Cloud
87

9-
- External Libraries: They can utilize external libraries and dependencies, enhancing their capabilities and versatility by integrating additional functionality.
8+
External functions in Databend allow you to define custom operations for processing data using external servers written in programming languages like Python. These functions enable you to extend Databend's capabilities by integrating custom logic, leveraging external libraries, and handling complex processing tasks. Key features of external functions include:
109

11-
- Advanced Logic: External functions can implement advanced and sophisticated data processing logic, making them ideal for complex data processing scenarios.
10+
- **Scalability**: Ideal for complex and resource-intensive data operations.
11+
- **External Libraries**: Leverage additional functionality through external libraries and dependencies.
12+
- **Advanced Logic**: Implement sophisticated data processing logic for complex scenarios.
1213

1314
## Supported Programming Languages
1415

15-
This table lists the supported languages and the required libraries for creating external functions in Databend:
16+
The following table lists the supported languages and the required libraries for creating external functions in Databend:
1617

1718
| Language | Required Library |
1819
|----------|-------------------------------------------------------|
1920
| Python | [databend-udf](https://pypi.org/project/databend-udf) |
2021

2122
## Managing External Functions
2223

23-
Databend provides a variety of commands to manage external functions. For details, see [External Function](/sql/sql-commands/ddl/external-function/).
24-
25-
## Databend Settings for External Functions Server
26-
27-
Databend provides the following settings to configure the external function server communication:
24+
You can manage external functions using SQL commands such as `CREATE FUNCTION`, `DROP FUNCTION`, and `SHOW FUNCTIONS`. For more details, see [External Function](/sql/sql-commands/ddl/external-function/).
2825

29-
| Setting Name | Default Value | Description | Range |
30-
|--------------|---------------|------------------------------------------------|-------|
31-
| `external_server_connect_timeout_secs` | 10 | Connection timeout to external server | 0 to u64::MAX |
32-
| `external_server_request_timeout_secs` | 180 | Request timeout to external server | 0 to u64::MAX |
33-
| `external_server_request_batch_rows` | 65536 | Request batch rows to external server | 1 to u64::MAX |
26+
## Configuring External Functions in Databend Cloud
3427

28+
To use external functions in Databend Cloud, you need to **allowlist the addresses of your external function servers**. The external function server must be accessible via a domain name over HTTPS. Please contact Databend Cloud support to add your allowed UDF server addresses:
3529

36-
## Usage Examples
30+
1. Navigate to **Support** > **Create New Ticket** in the Databend Cloud console.
31+
2. Provide the external server addresses (with HTTPS domain names) you wish to allowlist.
32+
3. Submit the ticket and await confirmation from the support team.
3733

38-
This section demonstrates how to create an external function in each of the [Supported Programming Languages](#supported-programming-languages).
34+
## Usage Example: Creating an External Function in Python
3935

40-
### Creating an External Function in Python
36+
This section demonstrates how to create an external function using Python.
4137

42-
1. Before starting Databend, add the following parameters to the [query] section in your [databend-query.toml](https://github.com/datafuselabs/databend/blob/main/scripts/distribution/configs/databend-query.toml) configuration file.
43-
44-
:::note
45-
If you are on Databend Cloud, skip this step and contact us with your allowed UDF server addresses by creating a ticket on **Support** > **Create New Ticket**.
46-
:::
47-
48-
```toml title='databend-query.toml'
49-
[query]
50-
...
51-
enable_udf_server = true
52-
# List the allowed server addresses, separating multiple addresses with commas.
53-
# For example, ['http://0.0.0.0:8815', 'http://example.com']
54-
udf_server_allow_list = ['http://0.0.0.0:8815']
55-
...
56-
```
38+
### 1. Install the Required Library
5739

58-
2. Install [databend-udf](https://pypi.org/project/databend-udf) using pip. If you haven't installed pip, you can download and install it following the official documentation: [Installing pip](https://pip.pypa.io/en/stable/installation/).
40+
Install the [databend-udf](https://pypi.org/project/databend-udf) library using `pip`:
5941

6042
```bash
6143
pip install databend-udf
6244
```
6345

64-
3. Define your function. This code defines and runs an external server in Python, which exposes a custom function *gcd* for calculating the greatest common divisor of two integers and allows remote execution of this function:
46+
### 2. Define Your Function
6547

66-
```python title='external_function.py'
67-
from databend_udf import *
48+
Create a Python file (e.g., `external_function.py`) and define your external function. The following example defines an external server in Python that exposes a custom function `gcd` for calculating the greatest common divisor of two integers:
49+
50+
```python
51+
from databend_udf import udf, UDFServer
6852

6953
@udf(
7054
input_types=["INT", "INT"],
@@ -73,60 +57,92 @@ from databend_udf import *
7357
)
7458
def gcd(x: int, y: int) -> int:
7559
while y != 0:
76-
(x, y) = (y, x % y)
60+
x, y = y, x % y
7761
return x
7862

7963
if __name__ == '__main__':
80-
# create an external server listening at '0.0.0.0:8815'
64+
# Create an external server listening at '0.0.0.0:8815'
8165
server = UDFServer("0.0.0.0:8815")
82-
# add defined functions
66+
# Add the defined function
8367
server.add_function(gcd)
84-
# start the external server
68+
# Start the external server
8569
server.serve()
8670
```
8771

88-
`@udf` is a decorator used for defining external functions in Databend, supporting the following parameters:
72+
**Explanation of `@udf` Decorator Parameters:**
8973

90-
| Parameter | Description |
91-
|--------------|-----------------------------------------------------------------------------------------------------|
92-
| input_types | A list of strings or Arrow data types that specify the input data types. |
93-
| result_type | A string or an Arrow data type that specifies the return value type. |
94-
| name | An optional string specifying the function name. If not provided, the original name will be used. |
95-
| io_threads | Number of I/O threads used per data chunk for I/O bound functions. |
96-
| skip_null | A boolean value specifying whether to skip NULL values. If set to True, NULL values will not be passed to the function, and the corresponding return value is set to NULL. Default is False. |
74+
| Parameter | Description |
75+
|--------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
76+
| `input_types` | A list of strings specifying the input data types (e.g., `["INT", "VARCHAR"]`). |
77+
| `result_type` | A string specifying the return value type (e.g., `"INT"`). |
78+
| `name` | (Optional) Custom name for the function. If not provided, the original function name is used. |
79+
| `io_threads` | Number of I/O threads used per data chunk for I/O-bound functions. |
80+
| `skip_null` | If set to `True`, NULL values are not passed to the function, and the corresponding return value is set to NULL. Default is `False`. |
9781

98-
This table illustrates the correspondence between Databend data types and their corresponding Python equivalents:
82+
**Data Type Mappings Between Databend and Python:**
9983

10084
| Databend Type | Python Type |
101-
|-----------------------|-----------------------|
102-
| BOOLEAN | bool |
103-
| TINYINT (UNSIGNED) | int |
104-
| SMALLINT (UNSIGNED) | int |
105-
| INT (UNSIGNED) | int |
106-
| BIGINT (UNSIGNED) | int |
107-
| FLOAT | float |
108-
| DOUBLE | float |
109-
| DECIMAL | decimal.Decimal |
110-
| DATE | datetime.date |
111-
| TIMESTAMP | datetime.datetime |
112-
| VARCHAR | str |
113-
| VARIANT | any |
114-
| MAP(K,V) | dict |
115-
| ARRAY(T) | list[T] |
116-
| TUPLE(T...) | tuple(T...) |
117-
118-
4. Run the Python file to start the external server:
119-
120-
```shell
85+
|-----------------------|----------------------|
86+
| BOOLEAN | `bool` |
87+
| TINYINT (UNSIGNED) | `int` |
88+
| SMALLINT (UNSIGNED) | `int` |
89+
| INT (UNSIGNED) | `int` |
90+
| BIGINT (UNSIGNED) | `int` |
91+
| FLOAT | `float` |
92+
| DOUBLE | `float` |
93+
| DECIMAL | `decimal.Decimal` |
94+
| DATE | `datetime.date` |
95+
| TIMESTAMP | `datetime.datetime` |
96+
| VARCHAR | `str` |
97+
| VARIANT | `any` |
98+
| MAP(K,V) | `dict` |
99+
| ARRAY(T) | `list[T]` |
100+
| TUPLE(T,...) | `tuple(T,...)` |
101+
102+
### 3. Run the External Server
103+
104+
Run the Python file to start the external server:
105+
106+
```bash
121107
python3 external_function.py
122108
```
123109

124-
5. Register the function *gcd* with the [CREATE FUNCTION](/sql/sql-commands/ddl/external-function/) in Databend:
110+
**Note:** Ensure that the server is accessible from Databend Cloud and that the address is allowlisted. If not already done, contact Databend Cloud support to add the server address to the allowlist.
111+
112+
### 4. Register the Function in Databend Cloud
113+
114+
Register the function `gcd` in Databend using the `CREATE FUNCTION` statement:
125115

126116
```sql
127117
CREATE FUNCTION gcd (INT, INT)
128118
RETURNS INT
129-
LANGUAGE python
130-
HANDLER = 'gcd'
131-
ADDRESS = 'http://0.0.0.0:8815';
132-
```
119+
LANGUAGE PYTHON
120+
HANDLER = 'gcd'
121+
ADDRESS = '<your-allowed-server-address>';
122+
```
123+
124+
- Replace `<your-allowed-server-address>` with the actual address of your external server that has been allowlisted in Databend Cloud (must be an HTTPS domain).
125+
- The `HANDLER` specifies the name of the function as defined in your Python code.
126+
- The `ADDRESS` should match the address where your external server is running and must be allowlisted by Databend Cloud.
127+
128+
**Example:**
129+
130+
```sql
131+
CREATE FUNCTION gcd (INT, INT)
132+
RETURNS INT
133+
LANGUAGE PYTHON
134+
HANDLER = 'gcd'
135+
ADDRESS = 'https://your-server-address';
136+
```
137+
138+
**Important:** Before executing this statement, ensure that `'https://your-server-address'` is allowlisted in Databend Cloud by contacting support.
139+
140+
You can now use the external function `gcd` in your SQL queries:
141+
142+
```sql
143+
SELECT gcd(48, 18); -- Returns 6
144+
```
145+
146+
## Conclusion
147+
148+
External functions in Databend Cloud provide a powerful way to extend the functionality of your data processing pipelines by integrating custom code written in languages like Python. By following the steps outlined above, you can create and use external functions to handle complex processing tasks, leverage external libraries, and implement advanced logic.

0 commit comments

Comments
 (0)