You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/getting-started/example-datasets/nyc-taxi.md
+36-86Lines changed: 36 additions & 86 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,17 +10,30 @@ title: 'New York Taxi Data'
10
10
import Tabs from '@theme/Tabs';
11
11
import TabItem from '@theme/TabItem';
12
12
13
-
The New York taxi data consists of 3+ billion taxi and for-hire vehicle (Uber, Lyft, etc.) trips originating in New York City since 2009. The dataset can be obtained in a couple of ways:
13
+
The New York taxi data sample consists of 3+ billion taxi and for-hire vehicle (Uber, Lyft, etc.) trips originating in New York City since 2009. This getting started guide uses a 3m row sample.
14
+
15
+
The full dataset can be obtained in a couple of ways:
14
16
15
17
- insert the data directly into ClickHouse Cloud from S3 or GCS
16
18
- download prepared partitions
19
+
- Alternatively users can query the full dataset in our demo environment at [sql.clickhouse.com](https://sql.clickhouse.com/?query=U0VMRUNUIGNvdW50KCkgRlJPTSBueWNfdGF4aS50cmlwcw&chart=eyJ0eXBlIjoibGluZSIsImNvbmZpZyI6eyJ0aXRsZSI6IlRlbXBlcmF0dXJlIGJ5IGNvdW50cnkgYW5kIHllYXIiLCJ4YXhpcyI6InllYXIiLCJ5YXhpcyI6ImNvdW50KCkiLCJzZXJpZXMiOiJDQVNUKHBhc3Nlbmdlcl9jb3VudCwgJ1N0cmluZycpIn19).
20
+
21
+
22
+
:::note
23
+
The example queries below were executed on a **Production** instance of ClickHouse Cloud. For more information see
## Load the Data directly from Object Storage {#load-the-data-directly-from-object-storage}
47
60
48
-
Let's grab a small subset of the data for getting familiar with it. The data is in TSV files in object storage, which is easily streamed into
61
+
Users' can grab a small subset of the data (3 million rows) for getting familiar with it. The data is in TSV files in object storage, which is easily streamed into
49
62
ClickHouse Cloud using the `s3` table function.
50
63
51
64
The same data is stored in both S3 and GCS; choose either tab.
@@ -56,7 +69,7 @@ The same data is stored in both S3 and GCS; choose either tab.
56
69
The following command streams three files from a GCS bucket into the `trips` table (the `{0..2}` syntax is a wildcard for the values 0, 1, and 2):
57
70
58
71
```sql
59
-
INSERT INTOtrips
72
+
INSERT INTOnyc_taxi.trips_small
60
73
SELECT
61
74
trip_id,
62
75
pickup_datetime,
@@ -84,10 +97,10 @@ FROM gcs(
84
97
</TabItem>
85
98
<TabItemvalue="s3"label="S3">
86
99
87
-
The following command streams three files from an S3 bucket into the `trips` table (the `{0..2}` syntax is a wildcard for the values 0, 1, and 2):
100
+
The following command streams three files from an S3 bucket into the `trips_small` table (the `{0..2}` syntax is a wildcard for the values 0, 1, and 2):
88
101
89
102
```sql
90
-
INSERT INTOtrips
103
+
INSERT INTOnyc_taxi.trips_small
91
104
SELECT
92
105
trip_id,
93
106
pickup_datetime,
@@ -117,122 +130,59 @@ FROM s3(
117
130
118
131
## Sample Queries {#sample-queries}
119
132
133
+
The following queries are executed on the sample described above. Users can run the sample queries on the full dataset in [sql.clickhouse.com](https://sql.clickhouse.com/?query=U0VMRUNUIGNvdW50KCkgRlJPTSBueWNfdGF4aS50cmlwcw&chart=eyJ0eXBlIjoibGluZSIsImNvbmZpZyI6eyJ0aXRsZSI6IlRlbXBlcmF0dXJlIGJ5IGNvdW50cnkgYW5kIHllYXIiLCJ4YXhpcyI6InllYXIiLCJ5YXhpcyI6ImNvdW50KCkiLCJzZXJpZXMiOiJDQVNUKHBhc3Nlbmdlcl9jb3VudCwgJ1N0cmluZycpIn19), modifying the queries below to use the table `nyc_taxi.trips`.
134
+
120
135
Let's see how many rows were inserted:
121
136
122
-
```sql
137
+
```sql runnable
123
138
SELECTcount()
124
-
FROMtrips;
139
+
FROMnyc_taxi.trips_small;
125
140
```
126
141
127
142
Each TSV file has about 1M rows, and the three files have 3,000,317 rows. Let's look at a few rows:
128
143
129
-
```sql
144
+
```sql runnable
130
145
SELECT*
131
-
FROMtrips
146
+
FROMnyc_taxi.trips_small
132
147
LIMIT10;
133
148
```
134
149
135
-
Notice there are columns for the pickup and dropoff dates, geo coordinates, fare details, New York neighborhoods, and more:
{`${runBy()} Read ${formatReadableRows(results.response.statistics.rows_read)} rows and ${formatBytes(results.response.statistics.bytes_read)} in ${roundToDynamicPrecision(results.response.statistics.elapsed)} seconds`}
{`Read ${formatReadableRows(results.response.statistics.rows_read)} rows and ${formatBytes(results.response.statistics.bytes_read)} in ${roundToDynamicPrecision(results.response.statistics.elapsed)} secs`}
0 commit comments