Skip to content

Commit 01112f2

Browse files
committed
Apply suggestions from @jaladh-singhal code review
1 parent 822db12 commit 01112f2

File tree

1 file changed

+7
-5
lines changed

1 file changed

+7
-5
lines changed

tutorials/parquet-catalog-demos/euclid-hats-parquet.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,9 @@ Parquet is a file format that enables flexible and efficient data access by, amo
4646
supporting the application of both column and row filters when reading the data (very similar to a SQL query)
4747
so that only the desired data is loaded into memory.
4848

49-
HATS is a spatial partitioning scheme based on HEALPix that aims to
50-
produce partitions (files) of roughly equal size.
49+
[HATS](https://hats.readthedocs.io/) is a spatial partitioning scheme based on
50+
[HEALPix](https://healpix.jpl.nasa.gov/)
51+
that aims to produce partitions (files) of roughly equal size.
5152
This makes the files more efficient to work with,
5253
especially for large-scale analyses and/or parallel processing.
5354
It does this by adapting the HEALPix order at which data is partitioned in a given catalog based
@@ -143,9 +144,10 @@ In this section, we query the Euclid Q1 MER catalogs for likely stars and create
143144
Here, we use `lsdb` to query the parquet files that are sitting in an S3 bucket (the intro notebook uses `pyvo` to query the TAP service).
144145
`lsdb` enables efficient, large-scale queries on HATS catalogs, so let's look at *all* likely stars in Euclid Q1 instead of limiting to 10,000.
145146

146-
`lsdb` uses Dask for parallelization. So first, set up the workers.
147+
`lsdb` uses Dask for parallelization. Set up the client and workers.
147148

148149
```{code-cell}
150+
# This client will be used *implicitly* by all subsequent calls that require it.
149151
client = dask.distributed.Client(
150152
n_workers=os.cpu_count(), threads_per_worker=2, memory_limit="auto"
151153
)
@@ -172,7 +174,7 @@ euclid_stars
172174
```
173175

174176
```{code-cell}
175-
# Peek at the data.
177+
# Peek at the data. This must execute the query to load at least some data, so may take some time.
176178
euclid_stars.head(10)
177179
```
178180

@@ -267,6 +269,6 @@ print(schema.field("RIGHT_ASCENSION-CUTOUTS").metadata)
267269

268270
**Authors:** Troy Raen (Developer; Caltech/IPAC-IRSA) and the IRSA Data Science Team.
269271

270-
**Updated:** 2025-03-29
272+
**Updated:** 2025-05-05
271273

272274
**Contact:** [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or problems.

0 commit comments

Comments
 (0)