Skip to content

Conversation

@akrolewski
Copy link

I noticed that some objects in the ELGnotqso clustering catalogs were being classified as quasars in isQSO_randomforest. Specifically, I ran the following code snippet. The last two lines should have shape zero (because there should be no quasars in ELGnotqso) but they do not! It seems that preSelection was type int, but should have been type bool; this meant that preSelection was not being applied so objects with W1 > 22.3, W2 > 22.3, etc were being mistakenly classified as quasars.

import numpy as np
from astropy.io import fits
from astropy.table import Table, join
from desitarget.cuts import isELG_colors, notinELG_mask, isQSO_randomforest
import healpy as hp
from scipy.interpolate import RectBivariateSpline
from scipy.interpolate import NearestNDInterpolator

zmin = 0.8
zmax = 1.1

full_file = fits.open('/global/cfs/cdirs/desi/survey/catalogs/Y1/LSS/iron/LSScats/v1.5/ELG_LOPnotqso_full_HPmapcut.dat.fits')[1].data # Is this actually the right file? ok, need to look at my old scripts ....
cat = fits.open('/global/cfs/cdirs/desi/survey/catalogs/Y1/LSS/iron/LSScats/v1.5/ELG_LOPnotqso_clustering.dat.fits')[1].data

cat = join(cat, full_file, keys='TARGETID')
cat = cat[(cat['Z'] >= zmin) & (cat['Z'] <= zmax)]

import time

flux_g_ext_corr = 10**(0.4 * cat['EBV'] * 3.214) * cat['FLUX_G'] 
flux_r_ext_corr = 10**(0.4 * cat['EBV'] * 2.165) * cat['FLUX_R'] 
flux_z_ext_corr = 10**(0.4 * cat['EBV'] * 1.211) * cat['FLUX_Z']

flux_w1 = 10**(0.4 * cat['EBV'] * 0.184) * cat['FLUX_W1'] 
flux_w2 = 10**(0.4 * cat['EBV'] * 0.113) * cat['FLUX_W2'] 

outS = isQSO_randomforest(gflux=flux_g_ext_corr,
	rflux=flux_r_ext_corr,
	zflux=flux_z_ext_corr,
	maskbits=cat['MASKBITS'],
	w1flux=flux_w1,
	w2flux=flux_w2,
	objtype=cat['MORPHTYPE'],
	gnobs=cat['NOBS_G'],
	rnobs=cat['NOBS_R'],
	znobs=cat['NOBS_Z'],
	primary=np.ones_like(cat['NOBS_Z']),
	ra=cat['RA_1'],
	dec=cat['DEC_1'],
	south=True) # I guess south can be either True or False
	
outN = isQSO_randomforest(flux_g_ext_corr,
	flux_r_ext_corr,
	flux_z_ext_corr,
	cat['MASKBITS'],
	flux_w1,
	flux_w2,
	cat['MORPHTYPE'],
	cat['NOBS_G'],
	cat['NOBS_R'],
	cat['NOBS_Z'],
	np.ones_like(cat['NOBS_Z']),
	cat['RA_1'],
	cat['DEC_1'],
	south=False) # I guess south can be either True or False
	

# These should be zero but they're not!!!	
print(np.shape(np.where((outS == 1) & (cat['PHOTSYS_1'] == 'S') & (22.5-2.5*np.log10(flux_w1)
< 22.3) & (22.5-2.5*np.log10(flux_w2) < 22.3) & (22.5-2.5*np.log10(flux_r_ext_corr) < 23)
& (22.5-2.5*np.log10(flux_r_ext_corr) > 16.5) & (cat['MORPHTYPE'] == 'PSF')
)))

print(np.shape(np.where((outN == 1) & (cat['PHOTSYS_1'] == 'N')& (22.5-2.5*np.log10(flux_w1)
< 22.3) & (22.5-2.5*np.log10(flux_w2) < 22.3) & (22.5-2.5*np.log10(flux_r_ext_corr) < 23)
& (22.5-2.5*np.log10(flux_r_ext_corr) > 16.5) & (cat['MORPHTYPE'] == 'PSF')
)))

@coveralls
Copy link

Coverage Status

coverage: 41.104% (+0.003%) from 41.101%
when pulling 0dd522c on fix_bug_qso_rf
into b7c4e61 on main.

@geordie666
Copy link
Contributor

Thanks, @akrolewski.

This change would not be backward-compatible, which we usually try to avoid. So, we have a choice to make:

  • Rewrite this with a kwarg (e.g. fix_pre_selection=False) that, for catalog work, you could pass as True. Then, if fix_pre_selection==True you would apply your recasting as a Boolean.
  • Acknowledge that this is a bug and just break backward-compatibility.

Pinging @sbailey for an opinion.

@sbailey
Copy link
Contributor

sbailey commented Jun 23, 2025

I think the existing code is fine and the problem is that @akrolewski is passing in

primary=np.ones_like(cat['NOBS_Z'])

instead of

primary=cat['NOBS_Z']>0

(I'm not actually sure that is the correct use of NOBS_Z either, but at least it is a boolean input instead of an integer input).

In the first case, primary is an integer array but isQSO_randomforest is expecting it to be a boolean array, and that is mucking up the downstream type of preSelection. If you pass in primary as a boolean array the code appears to be ok.

We could make this more robust by doing initial type checking on the primary input, but I wouldn't recommend just silently casting that to bool either, since that raises the ambiguity of whether the input user had (incorrectly) intended those integers to be indices rather than 0/1 interpreted as False/True, etc.

@akrolewski please review what you intended to be passing as primary, and make sure that is a bool array, and then confirm whether that fixes the problem or if other issues remain.

@akrolewski
Copy link
Author

Ahh, thanks for catching that Stephen! Yes, I confirm that if I pass boolean input to primary, everything works as expected.

I agree that there is no need to make any changes to the code.

I think it would be helpful to note in the documentation of isQSO_randomforest that primary should be a boolean array. I ran into this issue because I needed to apply target selection to magnified versions of the LSS catalogs to measure magnification bias, but the PRIMARY column is not tracked into the LSS catalogs--which is why I passed a vector of all ones, and I didn't realize that I should be passing a vector of all booleans.

# ADM default mask bits from the Legacy Surveys not set.
preSelection &= imaging_mask(maskbits)

preSelection = preSelection.astype('bool')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following discussion in the PR, I suggest that we drop this preSelection type casting line and instead around line 1876 update it to be

if primary is None:
    primary = np.ones_like(gflux, dtype=bool)
else:
    assert primary.dtype == np.dtype(bool)

or something similar to check that primary is an array of booleans.

And also update the docstring to be complete with describing the inputs and their types. e.g. it is also unclear to me if this function is supposed to support scalar inputs for testing individual targets, or whether it only works with input arrays and scalars are explicitly not supported.

@geordie666 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbailey: This sounds sensible to me. @akrolewski: I suggest closing this PR, opening a separate issue linking to @sbailey's synopsis, here, and then assigning me to fix that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants