@@ -46,8 +46,9 @@ Parquet is a file format that enables flexible and efficient data access by, amo
46
46
supporting the application of both column and row filters when reading the data (very similar to a SQL query)
47
47
so that only the desired data is loaded into memory.
48
48
49
- HATS is a spatial partitioning scheme based on HEALPix that aims to
50
- produce partitions (files) of roughly equal size.
49
+ [ HATS] ( https://hats.readthedocs.io/ ) is a spatial partitioning scheme based on
50
+ [ HEALPix] ( https://healpix.jpl.nasa.gov/ )
51
+ that aims to produce partitions (files) of roughly equal size.
51
52
This makes the files more efficient to work with,
52
53
especially for large-scale analyses and/or parallel processing.
53
54
It does this by adapting the HEALPix order at which data is partitioned in a given catalog based
@@ -143,9 +144,10 @@ In this section, we query the Euclid Q1 MER catalogs for likely stars and create
143
144
Here, we use ` lsdb ` to query the parquet files that are sitting in an S3 bucket (the intro notebook uses ` pyvo ` to query the TAP service).
144
145
` lsdb ` enables efficient, large-scale queries on HATS catalogs, so let's look at * all* likely stars in Euclid Q1 instead of limiting to 10,000.
145
146
146
- ` lsdb ` uses Dask for parallelization. So first, set up the workers.
147
+ ` lsdb ` uses Dask for parallelization. Set up the client and workers.
147
148
148
149
``` {code-cell}
150
+ # This client will be used *implicitly* by all subsequent calls that require it.
149
151
client = dask.distributed.Client(
150
152
n_workers=os.cpu_count(), threads_per_worker=2, memory_limit="auto"
151
153
)
@@ -172,7 +174,7 @@ euclid_stars
172
174
```
173
175
174
176
``` {code-cell}
175
- # Peek at the data.
177
+ # Peek at the data. This must execute the query to load at least some data, so may take some time.
176
178
euclid_stars.head(10)
177
179
```
178
180
@@ -267,6 +269,6 @@ print(schema.field("RIGHT_ASCENSION-CUTOUTS").metadata)
267
269
268
270
** Authors:** Troy Raen (Developer; Caltech/IPAC-IRSA) and the IRSA Data Science Team.
269
271
270
- ** Updated:** 2025-03-29
272
+ ** Updated:** 2025-05-05
271
273
272
274
** Contact:** [ IRSA Helpdesk] ( https://irsa.ipac.caltech.edu/docs/help_desk.html ) with questions or problems.
0 commit comments