Skip to content
Merged
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
261 changes: 261 additions & 0 deletions docs/source/user-guide/sql/aggregate_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -618,6 +618,36 @@ regr_avgx(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Exmaple

consider the following table:

```sql
> create table daily_sales(day int, temperature int) as values (1,35), (2,36), (3, NULL), (4,37), (5,38);
> select * from daily_sales;
+-----+-------------+
| day | total_sales |
| --- | ----------- |
| 1 | 100 |
| 2 | 150 |
| 3 | 200 |
| 4 | NULL |
| 5 | 250 |
+-----+-------------+
```

```sql
SELECT regr_avgx(total_sales, day) AS avg_day --considering day(x) independent variable
FROM daily_sales; --output = (1+2+3+5)/4 = 2.75
+----------+
| avg_day |
+----------+
| 2.75 |
+----------+
1 row(s) fetched.
Elapsed 0.001 seconds.
```

### `regr_avgy`

Computes the average of the dependent variable (output) expression_y for the non-null paired data points.
Expand All @@ -631,6 +661,34 @@ regr_avgy(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

```sql
> create table daily_temperature(day int, temperature int) as values (1,30), (2,32), (3, NULL), (4,35), (5,36);
> select * from daily_temperature;
+-----+-------------+
| day | temperature |
| --- | ----------- |
| 1 | 30 |
| 2 | 32 |
| 3 | NULL |
| 4 | 35 |
| 5 | 36 |
+-----+-------------+
```

```sql
SELECT regr_avgy(temperature, day) AS avg_temperature --temperature as Dependent Variable(Y)
FROM daily_temperature;
+-----------------+
| avg_temperature |
+-----------------+
| 33.25 |
+-----------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
```

### `regr_count`

Counts the number of non-null paired data points.
Expand All @@ -644,6 +702,36 @@ regr_count(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

consider the following table:

```sql
> create table daily_metrics(day int, user_signups int) as values (1,100), (2,120), (3, NULL), (4,110), (5,NULL);
> select * from daily_metrics;
+-----+---------------+
| day | user_signups |
| --- | ------------- |
| 1 | 100 |
| 2 | 120 |
| 3 | NULL |
| 4 | 110 |
| 5 | NULL |
+-----+---------------+
```

```sql
SELECT regr_count(user_signups, day) AS valid_pairs
FROM daily_metrics;
+-------------+
| valid_pairs |
+-------------+
| 3 |
+-------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
```

### `regr_intercept`

Computes the y-intercept of the linear regression line. For the equation (y = kx + b), this function returns b.
Expand All @@ -657,6 +745,36 @@ regr_intercept(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

consider the following table:

```sql
>create table weekly_performances(week int, productivity_score int) as values (1,60), (2,65), (3, 70), (4,75), (5,80);
> select * from weekly_performances;
+------+---------------------+
| week | productivity_score |
| ---- | ------------------- |
| 1 | 60 |
| 2 | 65 |
| 3 | 70 |
| 4 | 75 |
| 5 | 80 |
+------+---------------------+
```

```sql
SELECT regr_intercept(productivity_score, week) AS intercept
FROM weekly_performance;
+----------+
|intercept |
+----------+
| 55 |
+----------+
1 row(s) fetched.
Elapsed 0.001 seconds.
```

### `regr_r2`

Computes the square of the correlation coefficient between the independent and dependent variables.
Expand All @@ -670,6 +788,36 @@ regr_r2(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

consider the following table:

```sql
>create table weekly_performances(day int ,user_signups int) as values (1,60), (2,65), (3, 70), (4,75), (5,80);
> select * from weekly_performances;
+------+---------------------+
| week | productivity_score |
| ---- | ------------------- |
| 1 | 60 |
| 2 | 65 |
| 3 | 70 |
| 4 | 75 |
| 5 | 80 |
+------+---------------------+
```

```sql
SELECT regr_r2(productivity_score, week) AS r_squared
FROM weekly_performance;
+---------+
|r_squared|
+---------+
| 1.0 |
+---------+
1 row(s) fetched.
Elapsed 0.001 seconds.
```

### `regr_slope`

Returns the slope of the linear regression line for non-null pairs in aggregate columns. Given input column Y and X: regr_slope(Y, X) returns the slope (k in Y = k\*X + b) using minimal RSS fitting.
Expand All @@ -683,6 +831,37 @@ regr_slope(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

```sql
>create table weekly_performances(day int,user_signups int) as values (1,60), (2,65), (3, 70), (4,75), (5,80);
> select * from weekly_performances;
+------+---------------------+
| week | productivity_score |
| ---- | ------------------- |
| 1 | 60 |
| 2 | 65 |
| 3 | 70 |
| 4 | 75 |
| 5 | 80 |
+------+---------------------+
```

```sql
--in simpler words slope = Δx/Δy
SELECT regr_slope(productivity_score, week) AS slope
FROM weekly_performance;
+--------+
| slope |
+--------+
| 5.0 |
+--------+
1 row(s) fetched.
Elapsed 0.001 seconds.
```

**Remember**: the slops tells _how much y changes when x increases by 1._

### `regr_sxx`

Computes the sum of squares of the independent variable.
Expand All @@ -696,6 +875,36 @@ regr_sxx(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

consider the following table `study_hours`:

```sql
> crate table study_hours(int student_id, int hours,int test_score) as values(1,2,55),(2,4,65) , (3,6,75),(4,8,85),(5,10,95);
>select * from study_hours;
+-------------+-----------+-----------------+
| student_id | hours (x) | test_score (y) |
| ----------- | --------- | --------------- |
| 1 | 2 | 55 |
| 2 | 4 | 65 |
| 3 | 6 | 75 |
| 4 | 8 | 85 |
| 5 | 10 | 95 |
+-------------+-----------+-----------------+
```

```sql
SELECT regr_sxx(test_score, hours) AS sxx
FROM study_hours; --Output - 40
+-------+
| sxx |
+-------+
| 40 |
+-------+
1 row(s) fetched.
Elapsed 0.001 seconds.
```

### `regr_sxy`

Computes the sum of products of paired data points.
Expand All @@ -709,6 +918,32 @@ regr_sxy(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

```sql
> create table employee_productivity(int week,int productivity_score) as values(1,60) , (2,65), (3,70);
>select * from employee_productivity;
+-------+---------------------+
| week | Procutivity_score |
+-------+---------------------+
| 1 | 65 |
| 2 | 70 |
| 3 | 75 |
+-------+---------------------+
```

```sql
SELECT regr_sxy(productivity_score, week) AS sum_product_deviations
FROM employee_productivity;
+------------------------+
| sum_product_deviations |
+------------------------+
| 10.0 |
+------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
```

### `regr_syy`

Computes the sum of squares of the dependent variable.
Expand All @@ -722,6 +957,32 @@ regr_syy(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

```sql
> create table employee_productivity(int week,int productivity_score) as values(1,60) , (2,65), (3,70);
>select * from employee_productivity;
+-------+---------------------+
| week | Procutivity_score |
+-------+---------------------+
| 1 | 65 |
| 2 | 70 |
| 3 | 75 |
+-------+---------------------+
```

```sql
SELECT regr_syy(productivity_score, week) AS sum_squares_y
FROM employee_productivity;
+---------------+
| sum_squares_y |
+---------------+
| 50.0 |
+---------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
```

### `stddev`

Returns the standard deviation of a set of numbers.
Expand Down