Fix issue in isQSO_randomforest #844

akrolewski · 2025-06-21T04:44:15Z

I noticed that some objects in the ELGnotqso clustering catalogs were being classified as quasars in isQSO_randomforest. Specifically, I ran the following code snippet. The last two lines should have shape zero (because there should be no quasars in ELGnotqso) but they do not! It seems that preSelection was type int, but should have been type bool; this meant that preSelection was not being applied so objects with W1 > 22.3, W2 > 22.3, etc were being mistakenly classified as quasars.

import numpy as np
from astropy.io import fits
from astropy.table import Table, join
from desitarget.cuts import isELG_colors, notinELG_mask, isQSO_randomforest
import healpy as hp
from scipy.interpolate import RectBivariateSpline
from scipy.interpolate import NearestNDInterpolator

zmin = 0.8
zmax = 1.1

full_file = fits.open('/global/cfs/cdirs/desi/survey/catalogs/Y1/LSS/iron/LSScats/v1.5/ELG_LOPnotqso_full_HPmapcut.dat.fits')[1].data # Is this actually the right file? ok, need to look at my old scripts ....
cat = fits.open('/global/cfs/cdirs/desi/survey/catalogs/Y1/LSS/iron/LSScats/v1.5/ELG_LOPnotqso_clustering.dat.fits')[1].data

cat = join(cat, full_file, keys='TARGETID')
cat = cat[(cat['Z'] >= zmin) & (cat['Z'] <= zmax)]

import time

flux_g_ext_corr = 10**(0.4 * cat['EBV'] * 3.214) * cat['FLUX_G'] 
flux_r_ext_corr = 10**(0.4 * cat['EBV'] * 2.165) * cat['FLUX_R'] 
flux_z_ext_corr = 10**(0.4 * cat['EBV'] * 1.211) * cat['FLUX_Z']

flux_w1 = 10**(0.4 * cat['EBV'] * 0.184) * cat['FLUX_W1'] 
flux_w2 = 10**(0.4 * cat['EBV'] * 0.113) * cat['FLUX_W2'] 

outS = isQSO_randomforest(gflux=flux_g_ext_corr,
	rflux=flux_r_ext_corr,
	zflux=flux_z_ext_corr,
	maskbits=cat['MASKBITS'],
	w1flux=flux_w1,
	w2flux=flux_w2,
	objtype=cat['MORPHTYPE'],
	gnobs=cat['NOBS_G'],
	rnobs=cat['NOBS_R'],
	znobs=cat['NOBS_Z'],
	primary=np.ones_like(cat['NOBS_Z']),
	ra=cat['RA_1'],
	dec=cat['DEC_1'],
	south=True) # I guess south can be either True or False
	
outN = isQSO_randomforest(flux_g_ext_corr,
	flux_r_ext_corr,
	flux_z_ext_corr,
	cat['MASKBITS'],
	flux_w1,
	flux_w2,
	cat['MORPHTYPE'],
	cat['NOBS_G'],
	cat['NOBS_R'],
	cat['NOBS_Z'],
	np.ones_like(cat['NOBS_Z']),
	cat['RA_1'],
	cat['DEC_1'],
	south=False) # I guess south can be either True or False
	

# These should be zero but they're not!!!	
print(np.shape(np.where((outS == 1) & (cat['PHOTSYS_1'] == 'S') & (22.5-2.5*np.log10(flux_w1)
< 22.3) & (22.5-2.5*np.log10(flux_w2) < 22.3) & (22.5-2.5*np.log10(flux_r_ext_corr) < 23)
& (22.5-2.5*np.log10(flux_r_ext_corr) > 16.5) & (cat['MORPHTYPE'] == 'PSF')
)))

print(np.shape(np.where((outN == 1) & (cat['PHOTSYS_1'] == 'N')& (22.5-2.5*np.log10(flux_w1)
< 22.3) & (22.5-2.5*np.log10(flux_w2) < 22.3) & (22.5-2.5*np.log10(flux_r_ext_corr) < 23)
& (22.5-2.5*np.log10(flux_r_ext_corr) > 16.5) & (cat['MORPHTYPE'] == 'PSF')
)))

coveralls · 2025-06-21T04:54:54Z

coverage: 41.104% (+0.003%) from 41.101%
when pulling 0dd522c on fix_bug_qso_rf
into b7c4e61 on main.

geordie666 · 2025-06-21T11:12:42Z

Thanks, @akrolewski.

This change would not be backward-compatible, which we usually try to avoid. So, we have a choice to make:

Rewrite this with a kwarg (e.g. fix_pre_selection=False) that, for catalog work, you could pass as True. Then, if fix_pre_selection==True you would apply your recasting as a Boolean.
Acknowledge that this is a bug and just break backward-compatibility.

Pinging @sbailey for an opinion.

sbailey · 2025-06-23T19:04:53Z

I think the existing code is fine and the problem is that @akrolewski is passing in

primary=np.ones_like(cat['NOBS_Z'])

instead of

primary=cat['NOBS_Z']>0

(I'm not actually sure that is the correct use of NOBS_Z either, but at least it is a boolean input instead of an integer input).

In the first case, primary is an integer array but isQSO_randomforest is expecting it to be a boolean array, and that is mucking up the downstream type of preSelection. If you pass in primary as a boolean array the code appears to be ok.

We could make this more robust by doing initial type checking on the primary input, but I wouldn't recommend just silently casting that to bool either, since that raises the ambiguity of whether the input user had (incorrectly) intended those integers to be indices rather than 0/1 interpreted as False/True, etc.

@akrolewski please review what you intended to be passing as primary, and make sure that is a bool array, and then confirm whether that fixes the problem or if other issues remain.

akrolewski · 2025-06-25T21:56:15Z

Ahh, thanks for catching that Stephen! Yes, I confirm that if I pass boolean input to primary, everything works as expected.

I agree that there is no need to make any changes to the code.

I think it would be helpful to note in the documentation of isQSO_randomforest that primary should be a boolean array. I ran into this issue because I needed to apply target selection to magnified versions of the LSS catalogs to measure magnification bias, but the PRIMARY column is not tracked into the LSS catalogs--which is why I passed a vector of all ones, and I didn't realize that I should be passing a vector of all booleans.

sbailey · 2025-06-25T22:23:37Z

py/desitarget/cuts.py

        # ADM default mask bits from the Legacy Surveys not set.
        preSelection &= imaging_mask(maskbits)
+
+    preSelection = preSelection.astype('bool')


Following discussion in the PR, I suggest that we drop this preSelection type casting line and instead around line 1876 update it to be

if primary is None: primary = np.ones_like(gflux, dtype=bool) else: assert primary.dtype == np.dtype(bool)

or something similar to check that primary is an array of booleans.

And also update the docstring to be complete with describing the inputs and their types. e.g. it is also unclear to me if this function is supposed to support scalar inputs for testing individual targets, or whether it only works with input arrays and scalars are explicitly not supported.

@geordie666 ?

@sbailey: This sounds sensible to me. @akrolewski: I suggest closing this PR, opening a separate issue linking to @sbailey's synopsis, here, and then assigning me to fix that issue.

change preSelection in is_qso_randomforest to type bool

0dd522c

sbailey reviewed Jun 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue in isQSO_randomforest #844

Fix issue in isQSO_randomforest #844

Uh oh!

akrolewski commented Jun 21, 2025

Uh oh!

coveralls commented Jun 21, 2025

Uh oh!

geordie666 commented Jun 21, 2025

Uh oh!

sbailey commented Jun 23, 2025

Uh oh!

akrolewski commented Jun 25, 2025

Uh oh!

sbailey Jun 25, 2025

Uh oh!

geordie666 Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix issue in isQSO_randomforest #844

Are you sure you want to change the base?

Fix issue in isQSO_randomforest #844

Uh oh!

Conversation

akrolewski commented Jun 21, 2025

Uh oh!

coveralls commented Jun 21, 2025

Uh oh!

geordie666 commented Jun 21, 2025

Uh oh!

sbailey commented Jun 23, 2025

Uh oh!

akrolewski commented Jun 25, 2025

Uh oh!

sbailey Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

geordie666 Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants