Utilizing Machine Learning Techniques for Particle Identification in PANDA experiment
To be able to use this code you have to be sure that Python interpreter is installed on your machine. If you do not have Python, Please navigate to Python website https://www.python.org/. It is preferable to install Python2 since we are using ROOT, and usually Python3 with ROOT have a lot of problems.
- Having Python2 installed we can now install the necessary Python modules. The required packages are:
- ROOT (Python bindings of ROOT https://root.cern.ch/building-root).
- ipython
- jupyter (optional)
- scipy
- numpy
- root_numpy
- pandas
- matplotlib
- seaborn
- tensorflow (for deep learning)
- tensorboard (for deep learning)
- keras (for deep learning)
 
to install any of these modules use pip, e.g pip install ipython. If you do not know the exact name of the module use pip search
and to narrow the search use grep with pip, e.g pip search numpy | grep numpy.
- The second step is to copy these two files PndPidMlAssociatorTask.cxxandPndPidMlAssociatorTask.hto:
 yourPandaRootDirectory/source/pid/PidClassifier/. Then openCMakeLists.txtand add the following lines:
 PidClassifier/PndPidMlAssociatorTask.cxxandPidClassifier/PndPidMlAssociatorTask.h.
You can find easily where to paste these lines. Then open PidLinkDef.h and add the following line:
#pragma link C++ class  PndPidMlAssociatorTask+;
Finally compile PANDARoot again from the build directory.
- WOW!. We are ready, the most important file here is Classifier.py. Open ipython from shell prompot, just typeipython, then import theClassifiermodule by typing inside ipythonimport Classifier.
The module contains some useful method, here is a list of the methods:
- loadmodel()
- loadmyfile()
- getListofFeatures()
- predict()
- getRootPrediction()
- PlotCorrMatrix()
- PlotFeature()
- PlotScatter()
to learn about any of these methods (what it does, what arguments it take, what it's return value) you can use help builtin method
just type in ipython e.g help(Classifier.PlotFeature), then you will get a documentation of the method.
For the purpose of using machine learning output probability distributions in your analysis, the getRootPrediction() method is
in hand, it accepts three parameters
- input file (pid file, last step of PANDARoot simulation).
- input trained model (at the moment only Random Forest classifier is available savedRF.pkl).
- output file (which will contain Ml probability distributions). an example call in ipython would be:
Classifier.getRootPrediction('signal_pid.root', 'savedRF.pkl', 'signal_ml.root')
please make sure that signal_pid.root and savedRF.pkl exist in the same directory. This function will create named
signal_ml.root which contain Ml output probability distributions.
- 
Finally in your analysis code tut_ana.CusefRun->AddFriend("signal_ml.root");to be able to access the Ml output probability distributions.
- 
For example tight selection of muons using Ml would be in the analysis code: 
theAnalysis->FillList(muminus, "MuonTightMinus", "PidAlgoMl");
Note: the analysis code taken from http://panda-wiki.gsi.de/cgi-bin/view/Computing/PandaRootRhoTutorial
- 
You can plot a histogram of any of the input features using PlotFeature(), or you can plot a scatter between two features usingPlotScatter().
- 
You can plot a correlation matrix between all features by using PlotCorrMatrix().
Have Fun!