-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Jet clustering and tagging should be a default feature in the CLD reconstruction chain.
Jet clustering is available via adding an argument and uses the Valencia algorithm via LCFIPlus. We could rethink, if we might not want to update this step to use the Durham algorithm with is widely used, e.g. in FCCAnalyses. The two methods show extreme differences in their jet definition (e.g. the sum of pfos in all jets does not equal the number of all pfos in the event in LCFIPLus while it does in FCCAnalyses).
The following plots shows that using the exact same events (ZH->vvbb, /eos/experiment/fcc/prod/fcc/ee/test_spring2024/240gev/Hbb/CLD_o2_v05/rec/00016783/000/Hbb_rec_16783_1.root
) does not lead to the same amount of particles per jet using FCCAnalysis and CLDConfig (labeled as key4hep here).


Using the two different jet definitions also leads to differences in the kinematics:

As jet clustering defines the base on which jet tagging is performed, tagging is sensitive to clustering. The jet tagging in full sim is trained and validated on the LCFIPlus jet clustering definition and does NOT show reasonable results on jet clustering in FCCAnalyses.
The following plot shows the tagging performace on ZH events at 240GeV using the Durham jet clustering algorithm in FCCAnalyses. The performance is significantly worse than if running the inference via CLDConfig and the LCFIPlus jet clustering defintion which performance is explained in this note.

I suggested that
- We must understand the jet clustering definition in LCFIPlus and reimplement a similar algorithm in FCCAnalyses for making jet tagging in full sim available via running the inference in FCCAnalyses. This is how most analyzers might want to set up their workflow.
AND/OR
- Implement the Durham jet clustering algorithm in CLDConfig as an option (I've tried this and chatted with @Zehvogel and @tmadlener. It did not work straight out of the box, I've stopped trying after a day of debugging). Then, once we confirmed that the clustering definition matched the one used in FCCAnalyses (or an other standard we agree on) one would need to retrain the tagger on new samples using the Durham jet clustering algorithm.
Regarding the jet tagging implementation in CLDConfig:
The basic functionality is already implemented in this pull request. For this to be properly implemented we need to switch to IOSvc, therefore @Zehvogel is working on a clean pull request here.
I want to highlight that it might be useful to have jet clustering and tagging implemented for N=2,3,4... jets as a default. To achieve this one would need to think about labeling the collections and how to best set up the collections without unnecessary copies of PFOs.