Skip to content

Commit fa29e18

Browse files
authored
Merge branch 'main' into amrit/storage-cli
2 parents 24b88f5 + 82fbfeb commit fa29e18

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+1529
-628
lines changed

.github/workflows/benchmarks.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ jobs:
1818
run:
1919
runs-on: ubuntu-latest
2020
steps:
21-
- uses: actions/checkout@v4
21+
- uses: actions/checkout@v5
2222
- name: Set up Python 3.13
2323
uses: actions/setup-python@v5
2424
with:

.github/workflows/release.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ jobs:
1717
runs-on: ubuntu-latest
1818
steps:
1919
- name: Check out the repository
20-
uses: actions/checkout@v4
20+
uses: actions/checkout@v5
2121
with:
2222
fetch-depth: 0
2323

.github/workflows/tests-studio.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,15 +62,15 @@ jobs:
6262
echo "Studio branch: $STUDIO_BRANCH"
6363
6464
- name: Check out Studio
65-
uses: actions/checkout@v4
65+
uses: actions/checkout@v5
6666
with:
6767
fetch-depth: 0
6868
repository: iterative/studio
6969
ref: ${{ env.STUDIO_BRANCH }}
7070
token: ${{ secrets.ITERATIVE_STUDIO_READ_ACCESS_TOKEN }}
7171

7272
- name: Check out repository
73-
uses: actions/checkout@v4
73+
uses: actions/checkout@v5
7474
with:
7575
path: './backend/datachain'
7676
fetch-depth: 0

.github/workflows/tests.yml

Lines changed: 37 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ jobs:
1818
runs-on: ubuntu-latest
1919
steps:
2020
- name: Check out the repository
21-
uses: actions/checkout@v4
21+
uses: actions/checkout@v5
2222
with:
2323
fetch-depth: 0
2424
ref: ${{ github.event.pull_request.head.sha || github.ref }}
@@ -73,11 +73,29 @@ jobs:
7373

7474
steps:
7575
- name: Check out the repository
76-
uses: actions/checkout@v4
76+
uses: actions/checkout@v5
7777
with:
7878
fetch-depth: 0
7979
ref: ${{ github.event.pull_request.head.sha || github.ref }}
8080

81+
- name: Setup PostgreSQL
82+
if: runner.os != 'Windows'
83+
uses: ikalnytskyi/action-setup-postgres@10ab8a56cc77b4823c2bfa57b1d4dd5605ef0481 # v7
84+
with:
85+
username: test
86+
password: test
87+
database: test_datachain
88+
port: 5432
89+
postgres-version: "17"
90+
id: postgres
91+
92+
- name: Set PostgreSQL URI
93+
if: runner.os != 'Windows'
94+
run: |
95+
FULL_URI="${{ steps.postgres.outputs.connection-uri }}"
96+
echo "TEST_POSTGRES_URI=${FULL_URI%/*}" >> "$GITHUB_ENV"
97+
shell: bash
98+
8199
- name: Set up Python ${{ matrix.pyv }}
82100
uses: actions/setup-python@v5
83101
with:
@@ -157,7 +175,7 @@ jobs:
157175
- {os: ubuntu-latest-4-cores, pyv: "3.13", group: multimodal}
158176

159177
steps:
160-
- uses: actions/checkout@v4
178+
- uses: actions/checkout@v5
161179
with:
162180
ref: ${{ github.event.pull_request.head.sha || github.ref }}
163181

@@ -176,6 +194,22 @@ jobs:
176194
- name: Install nox
177195
run: uv pip install nox --system
178196

197+
- name: Install FFmpeg on Windows
198+
if: runner.os == 'Windows'
199+
run: choco install ffmpeg
200+
201+
- name: Install FFmpeg on macOS
202+
if: runner.os == 'macOS'
203+
run: |
204+
brew install ffmpeg
205+
echo 'DYLD_FALLBACK_LIBRARY_PATH=/opt/homebrew/lib' >> "$GITHUB_ENV"
206+
207+
- name: Install FFmpeg on Ubuntu
208+
if: runner.os == 'Linux'
209+
run: |
210+
sudo apt update
211+
sudo apt install -y ffmpeg
212+
179213
- name: Set hf token
180214
if: matrix.group == 'llm_and_nlp'
181215
run: echo 'HF_TOKEN=${{ secrets.HF_TOKEN }}' >> "$GITHUB_ENV"

.github/workflows/update-template.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111
runs-on: ubuntu-latest
1212
steps:
1313
- name: Check out the repository
14-
uses: actions/checkout@v4
14+
uses: actions/checkout@v5
1515

1616
- name: Update template
1717
uses: iterative/py-template@main

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,3 +145,7 @@ cython_debug/
145145
*.pt
146146

147147
.DS_Store/
148+
149+
# for local dev, e.g. LLM generated files, .env.test to override
150+
# test variables, local scripts to try, etc
151+
local/

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ repos:
2424
- id: trailing-whitespace
2525
exclude: '^LICENSES/'
2626
- repo: https://github.com/astral-sh/ruff-pre-commit
27-
rev: 'v0.12.8'
27+
rev: 'v0.12.10'
2828
hooks:
2929
- id: ruff
3030
args: [--fix, --exit-non-zero-on-fix]

docs/references/func.md

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,38 @@
11
# Functions
22

3-
Use built-in functions for data manipulation and analysis to operate on the underlying database storing the chain data. These functions are useful for operations like [`DataChain.filter`](datachain.md#datachain.lib.dc.DataChain.filter) and [`DataChain.mutate`](datachain.md#datachain.lib.dc.DataChain.mutate). Import these functions from `datachain.func`.
3+
Use built-in functions for data manipulation and analysis to operate on the underlying database storing the chain data. These functions are useful for operations like [`DataChain.filter`](datachain.md#datachain.lib.dc.DataChain.filter) and [`DataChain.mutate`](datachain.md#datachain.lib.dc.DataChain.mutate).
44

5-
::: datachain.func
5+
Functions are organized by category and accessed through their respective modules. For example, string functions are accessed via `func.string.length()`, array functions via `func.array.contains()`, etc.
6+
7+
!!! note "Global Function Access"
8+
Only a subset of functions are available directly from `datachain.func` (e.g., `func.length`). Most functions should be accessed through their specific module namespace (e.g., `func.string.length`) to avoid naming conflicts.
9+
10+
## Function Categories
11+
12+
DataChain provides several categories of functions for different types of operations:
13+
14+
- **[Aggregate Functions](functions/aggregate.md)** - Functions for aggregating data like `sum`, `count`, `avg`, etc.
15+
- **[Array Functions](functions/array.md)** - Functions for working with arrays and lists
16+
- **[Conditional Functions](functions/conditional.md)** - Functions for conditional logic like `ifelse`, `case`, etc.
17+
- **[Numeric Functions](functions/numeric.md)** - Functions for numeric operations and computations
18+
- **[Path Functions](functions/path.md)** - Functions for working with file paths
19+
- **[Random Functions](functions/random.md)** - Functions for generating random values
20+
- **[String Functions](functions/string.md)** - Functions for string manipulation and processing
21+
- **[Window Functions](functions/window.md)** - Functions for window operations
22+
23+
## Usage
24+
25+
```python
26+
from datachain.func import aggregate, array, conditional, numeric, path, random, string, window
27+
28+
# Access functions through their module namespaces
29+
dc.mutate(
30+
text_length=string.length("text_column"),
31+
contains_item=array.contains("array_column", "value"),
32+
file_extension=path.file_ext("file_path")
33+
)
34+
35+
# Some commonly used functions are also available directly
36+
from datachain.func import sum, count, length, ifelse
37+
dc.mutate(total=sum("amount"))
38+
```
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Aggregate Functions
2+
3+
Aggregate functions perform calculations on sets of values and return a single result.
4+
5+
::: datachain.func.aggregate

docs/references/functions/array.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Array Functions
2+
3+
Functions for working with arrays and lists, including operations like distance calculations and array manipulation.
4+
5+
::: datachain.func.array

0 commit comments

Comments
 (0)