Skip to content

Conversation

@kk428
Copy link

@kk428 kk428 commented Dec 8, 2025

In PR #187, I added branches to trackingNtuple.root input file after clustering sim tracks into jets using FastJet. However, this method prevents us from comparing reconstructed tracks to the jets, which I need to do in order to examine how the jets affect the duplicate and fake rates. Here I have replaced the former method of jet clustering with GenJets that are added to trackingNtuple.root at the step where it is produced.

These are the pertinent changes I've made:

  • Added GenJets to Validation/RecoTrack/plugins/TrackingNtuple.cc.
    When the trackingNtuple.root is created with cmsDriver.py, it now will include the GenJet branches for genJetPt, genJetEta, and genJetPhi. I did not include a way to optionally enable/disable this.
  • Created RecoTracker/LSTCore/standalone/analysis/jets/reformat_jets2.py to compute ΔR for the sim tracks and add it to trackingNtuple.root
    This completely replaces both reformat_jets.py and myjets.py from PR 187.
  • Computed ΔR for the reconstructed tracks in RecoTracker/LSTCore/standalone/efficiency/src/performance.cc
    This is still enabled/disabled when running LST by the use of the -j tag.
  • Modified RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc and RecoTracker/LSTCore/standalone/efficiency/python/lst_plot_performance.py in small ways to reflect these changes.

The input file differs from the standard one due to the addition of GenJet branches when it is created and due to the sim_deltaEta, sim_deltaPhi, and sim_deltaR branches that are tacked on after. The only change to the output is the inclusion of efficiency, DR, and FR vs. ΔR plots, where ΔR is the distance from a track to its closest GenJet. Otherwise, this PR should not impact the performance of LST at all. The presence of GenJets in the N-tuple also does not affect LST regardless of whether the jet branches are enabled or disabled.

Copy link

@slava77 slava77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I mentioned during the meeting, it would help to fill the closest jet dr per sim_ (and per trk_ ) in this PR rather than to rely on the post-processing script

Comment on lines 97 to 100
#include "DataFormats/HepMCCandidate/interface/GenParticle.h"
#include "DataFormats/JetReco/interface/GenJet.h"
#include "DataFormats/JetReco/interface/GenJetCollection.h"
#include "DataFormats/HepMCCandidate/interface/GenParticleFwd.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alpha-order

Comment on lines 1298 to 1308
std::vector<uint16_t> ph2_clustSize;
std::vector<size_t> ph2_clustSize;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reverts a recent change from @alexandertuna ; please undo

Comment on lines 1431 to 1433
std::vector<float> genJetPt;
std::vector<float> genJetEta;
std::vector<float> genJetPhi;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep the same naming style as other branches genjet_pt,eta,phi

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at https://github.com/cms-sw/cmssw/blob/master/DataFormats/JetReco/interface/GenJet.h#L90C11-L99, I think it's worth to add invisibleEnergy, auxiliaryEnergy

@@ -1473,7 +1489,8 @@ TrackingNtuple::TrackingNtuple(const edm::ParameterSet& iConfig)
keepEleSimHits_(iConfig.getUntrackedParameter<bool>("keepEleSimHits")),
saveSimHitsP3_(iConfig.getUntrackedParameter<bool>("saveSimHitsP3")),
simHitBySignificance_(iConfig.getUntrackedParameter<bool>("simHitBySignificance")),
parametersDefiner_(iConfig.getUntrackedParameter<edm::InputTag>("beamSpot"), consumesCollector()) {
parametersDefiner_(iConfig.getUntrackedParameter<edm::InputTag>("beamSpot"), consumesCollector()),
tok_jets_(consumes<reco::GenJetCollection>(iConfig.getParameter<edm::InputTag>("JetSource"))) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tok_jets_(consumes<reco::GenJetCollection>(iConfig.getParameter<edm::InputTag>("JetSource"))) {
tok_jets_(consumes<reco::GenJetCollection>(iConfig.getParameter<edm::InputTag>("jetSource"))) {

naming style from https://cms-sw.github.io/cms_coding_rules.html

Comment on lines 2772 to 2777
edm::Handle<reco::GenJetCollection> genJets;
iEvent.getByToken(tok_jets_, genJets);
if (genJets.isValid()) {
for (unsigned iGenJet = 0; iGenJet < genJets->size(); ++iGenJet) {
const reco::GenJet& genJet = (*genJets)[iGenJet];
genJetPt.push_back(genJet.pt());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
edm::Handle<reco::GenJetCollection> genJets;
iEvent.getByToken(tok_jets_, genJets);
if (genJets.isValid()) {
for (unsigned iGenJet = 0; iGenJet < genJets->size(); ++iGenJet) {
const reco::GenJet& genJet = (*genJets)[iGenJet];
genJetPt.push_back(genJet.pt());
auto const& genJets = iEvent.get(tok_jets_);
for (auto const& jet : genJets) {
genjet_pt.push_back(jet.pt());

more compact access.
Also, I prefer to not silently skip: jets should be available in every event; if missing there's something wrong with the configuration and this should fail.

@@ -1043,103 +1068,101 @@ void fillEfficiencySet(int isimtrk,
const float vtx_z_thresh = 30;
const float vtx_perp_thresh = 2.5;

if (pt > 0 && jet_eta < 140 && jet_eta > -140 && (jet_eta > -999 && deltaEta > -999)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not sure if I searched well enough) shouldn't we have some jet pt minimum requirement to make the jet-related plots?
I expect it to be configurable with a default (order of or e.g.) 100 GeV

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the previous jet clustering method (from PR #187) with GenJets added directly to the trackingNtuple.root file during its creation. The changes enable comparison of reconstructed tracks to jets for examining duplicate and fake rates.

Key Changes:

  • GenJet branches (genjet_pt, genjet_eta, genjet_phi, etc.) are now added to trackingNtuple.root during creation
  • ΔR calculations between sim/reco tracks and their closest GenJet are computed and stored
  • Plotting infrastructure updated to support efficiency, duplicate rate, and fake rate vs. ΔR plots

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
Validation/RecoTrack/plugins/TrackingNtuple.cc Adds GenJet collection reading, new branches for GenJet properties and sim track ΔR values, and implements fillGenJets() method to compute ΔR between sim tracks and closest jets
RecoTracker/LSTCore/standalone/efficiency/src/performance.h Updates function parameter names from jet_eta/jet_phi/jet_pt to genJetEta/genJetPhi/genJetPt for consistency
RecoTracker/LSTCore/standalone/efficiency/src/performance.cc Renames jet-related branches to genJet naming, adds ΔR computation for reconstructed tracks, creates deltaR histograms for duplicate/fake rate analysis, and removes previous jet-related conditional logic
RecoTracker/LSTCore/standalone/efficiency/python/lst_plot_performance.py Updates variable choices and plotting labels from jet_* to genJet* naming convention, adds deltaR to additional metric plots
RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc Updates branch names from sim_jet_* to genJet* for consistency with new naming convention
RecoTracker/LSTCore/standalone/analysis/jets/reformat_jets2.py New script that adds sim_deltaEta, sim_deltaPhi, and sim_deltaR branches to trackingNtuple.root by computing ΔR between sim tracks and GenJets
RecoTracker/LSTCore/standalone/analysis/jets/reformat_jets.py Updates hardcoded file paths (appears to be for testing purposes, may be superseded by reformat_jets2.py)
Comments suppressed due to low confidence (3)

RecoTracker/LSTCore/standalone/efficiency/src/performance.cc:1568

  • The loop variable ijet is declared but not initialized. This will cause undefined behavior as the loop condition will evaluate an uninitialized variable. The initialization should be ijet = 0.
  if (std::abs(eta) < ana.eta_cut and pt > ana.pt_cut) {

RecoTracker/LSTCore/standalone/efficiency/src/performance.cc:1503

  • The GenJet vectors are being retrieved from the input file inside this function that is called for every reconstructed track. This is inefficient as it repeatedly reads the same data. Consider moving this retrieval outside the loop or to a higher scope where it can be done once per event.
  }
  if (std::abs(eta) < ana.eta_cut and pt > ana.pt_cut) {
    ana.tx.pushbackToBranch<float>(category_name + "_dr_denom_phi", phi);

RecoTracker/LSTCore/standalone/efficiency/src/performance.cc:1563

  • The GenJet vectors are being retrieved from the input file inside this function that is called for every reconstructed track. This is inefficient as it repeatedly reads the same data. Consider moving this retrieval outside the loop or to a higher scope where it can be done once per event.
      ana.tx.pushbackToBranch<float>(category_name + "_fdr_numer_eta", eta);
  }
  if (std::abs(eta) < ana.eta_cut) {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1444 to +1455
std::vector<float> genJetEta = lstEff.getVF("genJetEta");
std::vector<float> genJetPhi = lstEff.getVF("genJetPhi");

float dEtaj = 999;
float dPhij = 999;
float dRj = 999;
float dRTemp = 999;
float genJetPtj;

for (unsigned int ijet; ijet < genJetPhi.size(); ++ijet) {
genJetPtj = genJetPt[ijet];
if(genJetPtj > 100) {
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ΔR calculation logic is duplicated across three functions (fillFakeRateSet, fillDuplicateRateSet, fillFakeOrDuplicateRateSet). This duplicated code makes maintenance harder and increases the risk of inconsistencies. Consider extracting this logic into a helper function that computes the minimum ΔR to the closest GenJet.

Copilot uses AI. Check for mistakes.
Comment on lines 1416 to +1443
}
if (ana.jet_branches) {
std::vector<float> genJetPt = lstEff.getVF("genJetPt");
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GenJet vectors are being retrieved from the input file inside this function that is called for every reconstructed track. This is inefficient as it repeatedly reads the same data. Consider moving this retrieval outside the loop or to a higher scope where it can be done once per event.

Copilot uses AI. Check for mistakes.
float dRTemp = 999;
for (auto const& jet : genJets) {
dEtaj = sim_eta - jet.eta();
dPhij = std::acos(std::cos(sim_phi - jet.phi()));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dPhij = std::acos(std::cos(sim_phi - jet.phi()));
dPhij = reco::deltaPhi(sim_phi, jet.phi());

Comment on lines +2810 to +2811
dRj = std::sqrt(std::pow(dEtaj,2) + std::pow(dPhij,2));
if(dRj < dRTemp) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dRj = std::sqrt(std::pow(dEtaj,2) + std::pow(dPhij,2));
if(dRj < dRTemp) {
dRj2 = std::pow(dEtaj,2) + std::pow(dPhij,2);
if(dRj2 < dRTemp) {

then below call sqrt only when filling the sim_deltaR

Comment on lines +2086 to +2088
t->Branch("sim_deltaEta", &sim_deltaEta);
t->Branch("sim_deltaPhi", &sim_deltaPhi);
t->Branch("sim_deltaR", &sim_deltaR);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to where other sim_ branches are for clarity

std::vector<float> genjet_phi;
std::vector<float> genjet_invisible_energy;
std::vector<float> genjet_auxiliary_energy;
std::vector<float> sim_deltaEta;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<float> sim_deltaEta;
std::vector<float> sim_genJetDEta;

or

Suggested change
std::vector<float> sim_deltaEta;
std::vector<float> sim_genjet_deltaEta;

for clarity

Comment on lines +2817 to +2818
sim_deltaEta.push_back(dEtaTemp);
sim_deltaPhi.push_back(dPhiTemp);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of these it would be more useful to save the index of the genjet

jet_eta.at(isimtrk),
jet_phi.at(isimtrk),
jet_pt.at(isimtrk));
if (genJetPt.at(isimtrk) > 100) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this is not crashing: IIUC genJetPt is per jet while isimtrk is per sim track and there are many more sim tracks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants