Skip to content

Commit 5b76efb

Browse files
authored
Merge pull request #3640 from ClickHouse/nyc_taxis
nyc taxis
2 parents 2191d7e + 18b9cb7 commit 5b76efb

File tree

4 files changed

+50
-98
lines changed

4 files changed

+50
-98
lines changed

docs/getting-started/example-datasets/nyc-taxi.md

Lines changed: 36 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -10,17 +10,30 @@ title: 'New York Taxi Data'
1010
import Tabs from '@theme/Tabs';
1111
import TabItem from '@theme/TabItem';
1212

13-
The New York taxi data consists of 3+ billion taxi and for-hire vehicle (Uber, Lyft, etc.) trips originating in New York City since 2009. The dataset can be obtained in a couple of ways:
13+
The New York taxi data sample consists of 3+ billion taxi and for-hire vehicle (Uber, Lyft, etc.) trips originating in New York City since 2009. This getting started guide uses a 3m row sample.
14+
15+
The full dataset can be obtained in a couple of ways:
1416

1517
- insert the data directly into ClickHouse Cloud from S3 or GCS
1618
- download prepared partitions
19+
- Alternatively users can query the full dataset in our demo environment at [sql.clickhouse.com](https://sql.clickhouse.com/?query=U0VMRUNUIGNvdW50KCkgRlJPTSBueWNfdGF4aS50cmlwcw&chart=eyJ0eXBlIjoibGluZSIsImNvbmZpZyI6eyJ0aXRsZSI6IlRlbXBlcmF0dXJlIGJ5IGNvdW50cnkgYW5kIHllYXIiLCJ4YXhpcyI6InllYXIiLCJ5YXhpcyI6ImNvdW50KCkiLCJzZXJpZXMiOiJDQVNUKHBhc3Nlbmdlcl9jb3VudCwgJ1N0cmluZycpIn19).
20+
21+
22+
:::note
23+
The example queries below were executed on a **Production** instance of ClickHouse Cloud. For more information see
24+
["Playground specifications"](/getting-started/playground#specifications).
25+
:::
26+
1727

1828
## Create the table trips {#create-the-table-trips}
1929

2030
Start by creating a table for the taxi rides:
2131

2232
```sql
23-
CREATE TABLE trips (
33+
34+
CREATE DATABASE nyc_taxi;
35+
36+
CREATE TABLE nyc_taxi.trips_small (
2437
trip_id UInt32,
2538
pickup_datetime DateTime,
2639
dropoff_datetime DateTime,
@@ -45,7 +58,7 @@ PRIMARY KEY (pickup_datetime, dropoff_datetime);
4558

4659
## Load the Data directly from Object Storage {#load-the-data-directly-from-object-storage}
4760

48-
Let's grab a small subset of the data for getting familiar with it. The data is in TSV files in object storage, which is easily streamed into
61+
Users' can grab a small subset of the data (3 million rows) for getting familiar with it. The data is in TSV files in object storage, which is easily streamed into
4962
ClickHouse Cloud using the `s3` table function.
5063

5164
The same data is stored in both S3 and GCS; choose either tab.
@@ -56,7 +69,7 @@ The same data is stored in both S3 and GCS; choose either tab.
5669
The following command streams three files from a GCS bucket into the `trips` table (the `{0..2}` syntax is a wildcard for the values 0, 1, and 2):
5770

5871
```sql
59-
INSERT INTO trips
72+
INSERT INTO nyc_taxi.trips_small
6073
SELECT
6174
trip_id,
6275
pickup_datetime,
@@ -84,10 +97,10 @@ FROM gcs(
8497
</TabItem>
8598
<TabItem value="s3" label="S3">
8699

87-
The following command streams three files from an S3 bucket into the `trips` table (the `{0..2}` syntax is a wildcard for the values 0, 1, and 2):
100+
The following command streams three files from an S3 bucket into the `trips_small` table (the `{0..2}` syntax is a wildcard for the values 0, 1, and 2):
88101

89102
```sql
90-
INSERT INTO trips
103+
INSERT INTO nyc_taxi.trips_small
91104
SELECT
92105
trip_id,
93106
pickup_datetime,
@@ -117,122 +130,59 @@ FROM s3(
117130

118131
## Sample Queries {#sample-queries}
119132

133+
The following queries are executed on the sample described above. Users can run the sample queries on the full dataset in [sql.clickhouse.com](https://sql.clickhouse.com/?query=U0VMRUNUIGNvdW50KCkgRlJPTSBueWNfdGF4aS50cmlwcw&chart=eyJ0eXBlIjoibGluZSIsImNvbmZpZyI6eyJ0aXRsZSI6IlRlbXBlcmF0dXJlIGJ5IGNvdW50cnkgYW5kIHllYXIiLCJ4YXhpcyI6InllYXIiLCJ5YXhpcyI6ImNvdW50KCkiLCJzZXJpZXMiOiJDQVNUKHBhc3Nlbmdlcl9jb3VudCwgJ1N0cmluZycpIn19), modifying the queries below to use the table `nyc_taxi.trips`.
134+
120135
Let's see how many rows were inserted:
121136

122-
```sql
137+
```sql runnable
123138
SELECT count()
124-
FROM trips;
139+
FROM nyc_taxi.trips_small;
125140
```
126141

127142
Each TSV file has about 1M rows, and the three files have 3,000,317 rows. Let's look at a few rows:
128143

129-
```sql
144+
```sql runnable
130145
SELECT *
131-
FROM trips
146+
FROM nyc_taxi.trips_small
132147
LIMIT 10;
133148
```
134149

135-
Notice there are columns for the pickup and dropoff dates, geo coordinates, fare details, New York neighborhoods, and more:
136-
137-
```response
138-
┌────trip_id─┬─────pickup_datetime─┬────dropoff_datetime─┬───pickup_longitude─┬────pickup_latitude─┬──dropoff_longitude─┬───dropoff_latitude─┬─passenger_count─┬─trip_distance─┬─fare_amount─┬─extra─┬─tip_amount─┬─tolls_amount─┬─total_amount─┬─payment_type─┬─pickup_ntaname─────────────────────────────┬─dropoff_ntaname────────────────────────────┐
139-
│ 1200864931 │ 2015-07-01 00:00:13 │ 2015-07-01 00:14:41 │ -73.99046325683594 │ 40.746116638183594 │ -73.97918701171875 │ 40.78467559814453 │ 5 │ 3.54 │ 13.5 │ 0.5 │ 1 │ 0 │ 15.8 │ CSH │ Midtown-Midtown South │ Upper West Side │
140-
│ 1200018648 │ 2015-07-01 00:00:16 │ 2015-07-01 00:02:57 │ -73.78358459472656 │ 40.648677825927734 │ -73.80242919921875 │ 40.64767837524414 │ 1 │ 1.45 │ 6 │ 0.5 │ 0 │ 0 │ 7.3 │ CRE │ Airport │ Airport │
141-
│ 1201452450 │ 2015-07-01 00:00:20 │ 2015-07-01 00:11:07 │ -73.98579406738281 │ 40.72777557373047 │ -74.00482177734375 │ 40.73748779296875 │ 5 │ 1.56 │ 8.5 │ 0.5 │ 1.96 │ 0 │ 11.76 │ CSH │ East Village │ West Village │
142-
│ 1202368372 │ 2015-07-01 00:00:40 │ 2015-07-01 00:05:46 │ -74.00206756591797 │ 40.73833084106445 │ -74.00658416748047 │ 40.74875259399414 │ 2 │ 1 │ 6 │ 0.5 │ 0 │ 0 │ 7.3 │ CRE │ West Village │ Hudson Yards-Chelsea-Flatiron-Union Square │
143-
│ 1200831168 │ 2015-07-01 00:01:06 │ 2015-07-01 00:09:23 │ -73.98748016357422 │ 40.74344253540039 │ -74.00575256347656 │ 40.716793060302734 │ 1 │ 2.3 │ 9 │ 0.5 │ 2 │ 0 │ 12.3 │ CSH │ Hudson Yards-Chelsea-Flatiron-Union Square │ SoHo-TriBeCa-Civic Center-Little Italy │
144-
│ 1201362116 │ 2015-07-01 00:01:07 │ 2015-07-01 00:03:31 │ -73.9926986694336 │ 40.75826644897461 │ -73.98628997802734 │ 40.76075744628906 │ 1 │ 0.6 │ 4 │ 0.5 │ 0 │ 0 │ 5.3 │ CRE │ Clinton │ Midtown-Midtown South │
145-
│ 1200639419 │ 2015-07-01 00:01:13 │ 2015-07-01 00:03:56 │ -74.00382995605469 │ 40.741981506347656 │ -73.99711608886719 │ 40.742271423339844 │ 1 │ 0.49 │ 4 │ 0.5 │ 0 │ 0 │ 5.3 │ CRE │ Hudson Yards-Chelsea-Flatiron-Union Square │ Hudson Yards-Chelsea-Flatiron-Union Square │
146-
│ 1201181622 │ 2015-07-01 00:01:17 │ 2015-07-01 00:05:12 │ -73.9512710571289 │ 40.78261947631836 │ -73.95230865478516 │ 40.77476119995117 │ 4 │ 0.97 │ 5 │ 0.5 │ 1 │ 0 │ 7.3 │ CSH │ Upper East Side-Carnegie Hill │ Yorkville │
147-
│ 1200978273 │ 2015-07-01 00:01:28 │ 2015-07-01 00:09:46 │ -74.00822448730469 │ 40.72113037109375 │ -74.00422668457031 │ 40.70782470703125 │ 1 │ 1.71 │ 8.5 │ 0.5 │ 1.96 │ 0 │ 11.76 │ CSH │ SoHo-TriBeCa-Civic Center-Little Italy │ Battery Park City-Lower Manhattan │
148-
│ 1203283366 │ 2015-07-01 00:01:47 │ 2015-07-01 00:24:26 │ -73.98199462890625 │ 40.77289962768555 │ -73.91968536376953 │ 40.766082763671875 │ 3 │ 5.26 │ 19.5 │ 0.5 │ 5.2 │ 0 │ 26 │ CSH │ Lincoln Square │ Astoria │
149-
└────────────┴─────────────────────┴─────────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────┴─────────────────┴───────────────┴─────────────┴───────┴────────────┴──────────────┴──────────────┴──────────────┴────────────────────────────────────────────┴────────────────────────────────────────────┘
150-
```
150+
Notice there are columns for the pickup and dropoff dates, geo coordinates, fare details, New York neighborhoods, and more.
151+
151152

152153
Let's run a few queries. This query shows us the top 10 neighborhoods that have the most frequent pickups:
153154

154-
```sql
155+
```sql runnable
155156
SELECT
156157
pickup_ntaname,
157158
count(*) AS count
158-
FROM trips
159+
FROM nyc_taxi.trips_small WHERE pickup_ntaname != ''
159160
GROUP BY pickup_ntaname
160161
ORDER BY count DESC
161162
LIMIT 10;
162163
```
163164

164-
The result is:
165-
166-
```response
167-
┌─pickup_ntaname─────────────────────────────┬──count─┐
168-
│ Midtown-Midtown South │ 526864 │
169-
│ Hudson Yards-Chelsea-Flatiron-Union Square │ 288797 │
170-
│ West Village │ 210436 │
171-
│ Turtle Bay-East Midtown │ 197111 │
172-
│ Upper East Side-Carnegie Hill │ 184327 │
173-
│ Airport │ 151343 │
174-
│ SoHo-TriBeCa-Civic Center-Little Italy │ 144967 │
175-
│ Murray Hill-Kips Bay │ 138599 │
176-
│ Upper West Side │ 135469 │
177-
│ Clinton │ 130002 │
178-
└────────────────────────────────────────────┴────────┘
179-
```
180-
181165
This query shows the average fare based on the number of passengers:
182166

183-
```sql
167+
```sql runnable view='chart' chart_config='eyJ0eXBlIjoiYmFyIiwiY29uZmlnIjp7InhheGlzIjoicGFzc2VuZ2VyX2NvdW50IiwieWF4aXMiOiJhdmcodG90YWxfYW1vdW50KSIsInRpdGxlIjoiQXZlcmFnZSBmYXJlIGJ5IHBhc3NlbmdlciBjb3VudCJ9fQ'
184168
SELECT
185169
passenger_count,
186170
avg(total_amount)
187-
FROM trips
171+
FROM nyc_taxi.trips_small
172+
WHERE passenger_count < 10
188173
GROUP BY passenger_count;
189174
```
190175

191-
```response
192-
┌─passenger_count─┬──avg(total_amount)─┐
193-
│ 0 │ 25.226335263065018 │
194-
│ 1 │ 15.961279340656672 │
195-
│ 2 │ 17.146174183960667 │
196-
│ 3 │ 17.65380033178517 │
197-
│ 4 │ 17.248804201047456 │
198-
│ 5 │ 16.353501285179135 │
199-
│ 6 │ 15.995094439202836 │
200-
│ 7 │ 62.077143805367605 │
201-
│ 8 │ 26.120000791549682 │
202-
│ 9 │ 10.300000190734863 │
203-
└─────────────────┴────────────────────┘
204-
```
205-
206176
Here's a correlation between the number of passengers and the distance of the trip:
207177

208-
```sql
178+
```sql runnable chart_config='eyJ0eXBlIjoiaG9yaXpvbnRhbCBiYXIiLCJjb25maWciOnsidGl0bGUiOiJEaXN0YW5jZSBieSBwYXNzZW5nZXIgY291bnQiLCJ4YXhpcyI6InBhc3Nlbmdlcl9jb3VudCIsInlheGlzIjoiZGlzdGFuY2UiLCJ6YXhpcyI6ImNvdW50KCkifX0'
209179
SELECT
210180
passenger_count,
211-
toYear(pickup_datetime) AS year,
212181
round(trip_distance) AS distance,
213182
count(*)
214-
FROM trips
215-
GROUP BY passenger_count, year, distance
216-
ORDER BY year, count(*) DESC;
217-
```
218-
219-
The first part of the result is:
220-
221-
```response
222-
┌─passenger_count─┬─year─┬─distance─┬─count()─┐
223-
│ 1 │ 2015 │ 1 │ 748644 │
224-
│ 1 │ 2015 │ 2 │ 521602 │
225-
│ 1 │ 2015 │ 3 │ 225077 │
226-
│ 2 │ 2015 │ 1 │ 144990 │
227-
│ 1 │ 2015 │ 4 │ 134782 │
228-
│ 1 │ 2015 │ 0 │ 127284 │
229-
│ 2 │ 2015 │ 2 │ 106411 │
230-
│ 1 │ 2015 │ 5 │ 72725 │
231-
│ 5 │ 2015 │ 1 │ 59343 │
232-
│ 1 │ 2015 │ 6 │ 53447 │
233-
│ 2 │ 2015 │ 3 │ 48019 │
234-
│ 3 │ 2015 │ 1 │ 44865 │
235-
│ 6 │ 2015 │ 1 │ 39409 │
183+
FROM nyc_taxi.trips_small
184+
GROUP BY passenger_count, distance
185+
ORDER BY passenger_count ASC, count(*) DESC;
236186
```
237187

238188
## Download of Prepared Partitions {#download-of-prepared-partitions}

sidebars.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,7 @@ const sidebars = {
196196
collapsible: true,
197197
link: { type: "doc", id: "getting-started/index" },
198198
items: [
199+
"getting-started/example-datasets/nyc-taxi",
199200
"getting-started/example-datasets/amazon-reviews",
200201
"getting-started/example-datasets/amplab-benchmark",
201202
"getting-started/example-datasets/brown-benchmark",
@@ -209,7 +210,6 @@ const sidebars = {
209210
"getting-started/example-datasets/menus",
210211
"getting-started/example-datasets/metrica",
211212
"getting-started/example-datasets/noaa",
212-
"getting-started/example-datasets/nyc-taxi",
213213
"getting-started/example-datasets/nypd_complaint_data",
214214
"getting-started/example-datasets/ontime",
215215
"getting-started/example-datasets/opensky",

src/components/CodeViewer/CodeInterpreter.tsx

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,7 @@ function CodeInterpreter({
175175
<Tooltip.Trigger>
176176
<Button
177177
className='h-full m-auto'
178+
fillWidth={false}
178179
iconLeft='chevron-down'
179180
onClick={closeResultPanel}
180181
type='empty'></Button>
@@ -186,6 +187,7 @@ function CodeInterpreter({
186187
<Tooltip.Trigger>
187188
<Button
188189
className='h-full m-auto'
190+
fillWidth={false}
189191
iconLeft='chevron-up'
190192
onClick={openTableResultPanel}
191193
type='empty'></Button>
@@ -195,15 +197,15 @@ function CodeInterpreter({
195197
)
196198

197199
return (
198-
<div className='flex items-end whitespace-pre-wrap'>
200+
<div className={`flex items-end whitespace-pre-wrap ${chart && 'w-[180px] h-[28px]'}`}>
199201
{show_results}
200202
{chart && (
201-
<div className='my-auto w-[80px] sm:w-[140px]'>
203+
<div className=''>
202204
<RadioGroup
203205
orientation='vertical'
204206
value={currentView}>
205207
<RadioGroup.Item
206-
label='Table'
208+
label='Table'
207209
onClick={(): void => {
208210
setCurrentView(DefaultView.Table)
209211
}}
@@ -230,14 +232,14 @@ function CodeInterpreter({
230232
return (
231233
<div className='flex justify-between'>
232234
<div className='flex items-center'>
233-
<div className='flex items-center'>{hideTableResultButton()}</div>
234-
<div className='flex items-center'>
235-
{show_statistics && results?.response?.statistics && (
236-
<div className={`whitespace-pre-wrap text-xs mx-auto italic ${chart ? 'ml-[8px]' : ''}`}>
237-
{`${runBy()} Read ${formatReadableRows(results.response.statistics.rows_read)} rows and ${formatBytes(results.response.statistics.bytes_read)} in ${roundToDynamicPrecision(results.response.statistics.elapsed)} seconds`}
235+
<div className='flex items-center'>{hideTableResultButton()}</div>
236+
<div className='flex items-end min-h-[28px]'>
237+
{show_statistics && results?.response?.statistics && (
238+
<div className={`whitespace-pre-wrap text-[12px] mx-auto italic ${chart ? 'ml-[8px]' : ''}`}>
239+
{`Read ${formatReadableRows(results.response.statistics.rows_read)} rows and ${formatBytes(results.response.statistics.bytes_read)} in ${roundToDynamicPrecision(results.response.statistics.elapsed)} secs`}
240+
</div>
241+
)}
238242
</div>
239-
)}
240-
</div>
241243
</div>
242244

243245
<div className='flex items-center'>

src/components/CodeViewer/index.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
import { CodeBlock, ClickUIProvider, Text, Separator } from '@clickhouse/click-ui/bundled'
1+
import { CodeBlock, ClickUIProvider, Text } from '@clickhouse/click-ui/bundled'
22
import CodeInterpreter from './CodeInterpreter'
33
import { DefaultView } from './CodeResults'
44
import { ChartConfig, ChartType } from './types'

0 commit comments

Comments
 (0)