Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 27 additions & 19 deletions docs/getting-started/example-datasets/amazon-reviews.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ title: 'Amazon Customer Review'
This dataset contains over 150M customer reviews of Amazon products. The data is in snappy-compressed Parquet files in AWS S3 that total 49GB in size (compressed). Let's walk through the steps to insert it into ClickHouse.

:::note
The queries below were executed on a **Production** instance of [ClickHouse Cloud](https://clickhouse.cloud).
The queries below were executed on a **Production** instance of ClickHouse Cloud. For more information see
["Playground specifications"](/getting-started/playground#specifications).
:::

## Loading the dataset {#loading-the-dataset}
Expand Down Expand Up @@ -86,21 +87,26 @@ CREATE DATABASE amazon

CREATE TABLE amazon.amazon_reviews
(
review_date Date,
marketplace LowCardinality(String),
customer_id UInt64,
review_id String,
product_id String,
product_parent UInt64,
product_title String,
product_category LowCardinality(String),
star_rating UInt8,
helpful_votes UInt32,
total_votes UInt32,
vine Bool,
verified_purchase Bool,
review_headline String,
review_body String
`review_date` Date,
`marketplace` LowCardinality(String),
`customer_id` UInt64,
`review_id` String,
`product_id` String,
`product_parent` UInt64,
`product_title` String,
`product_category` LowCardinality(String),
`star_rating` UInt8,
`helpful_votes` UInt32,
`total_votes` UInt32,
`vine` Bool,
`verified_purchase` Bool,
`review_headline` String,
`review_body` String,
PROJECTION helpful_votes
(
SELECT *
ORDER BY helpful_votes
)
)
ENGINE = MergeTree
ORDER BY (review_date, product_category)
Expand Down Expand Up @@ -146,7 +152,7 @@ The original data was about 70G, but compressed in ClickHouse it takes up about

## Example queries {#example-queries}

7. Let's run some queries...here are the top 10 most-helpful reviews in the dataset:
7. Let's run some queries. Here are the top 10 most-helpful reviews in the dataset:

```sql runnable
SELECT
Expand All @@ -157,7 +163,9 @@ ORDER BY helpful_votes DESC
LIMIT 10
```

Notice the query has to process all 151M rows in less than a second!
:::note
This query is using a projection to speed up performance.
:::

8. Here are the top 10 products in Amazon with the most reviews:

Expand Down Expand Up @@ -214,7 +222,7 @@ ORDER BY count DESC
LIMIT 50;
```

The query only takes 4 seconds - which is impressive - and the results are a fun read:
Notice the query time for such a large amount of data. The results are also a fun read!

12. We can run the same query again, except this time we search for **awesome** in the reviews:

Expand Down
8 changes: 8 additions & 0 deletions docs/getting-started/playground.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,11 @@ TCP endpoint example with [CLI](../interfaces/cli.md):
```bash
clickhouse client --secure --host play.clickhouse.com --user explorer
```

## Playground specifications {#specifications}

our ClickHouse Playground is running with the following specifications:

- Hosted on Google Cloud (GCE) in the US Central region (US-Central-1)
- 3-replica setup
- 256 GiB of storage and 59 virtual CPUs each.
1 change: 0 additions & 1 deletion docusaurus.config.en.js
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,6 @@ const config = {
if (
docPath.includes("development") ||
docPath.includes("engines") ||
docPath.includes("getting-started") ||
docPath.includes("interfaces") ||
docPath.includes("operations") ||
docPath.includes("sql-reference")
Expand Down