|
| 1 | +############### |
| 2 | +Project Report: |
| 3 | +############### |
| 4 | + |
| 5 | +*************************************************************** |
| 6 | +Google Summer of Code 2017 |
| 7 | +*************************************************************** |
| 8 | + |
| 9 | +=============================================================== |
| 10 | +Umbrella Organization: CERN-HSF, CERN’s HEP software foundation |
| 11 | +=============================================================== |
| 12 | + |
| 13 | +================================================================================================================================ |
| 14 | +Project: Efficient Python routines for analysis on massively multi-threaded platforms-Python bindings for the Hydra C++ library |
| 15 | +================================================================================================================================ |
| 16 | + |
| 17 | +Submitted by- Deepanshu Thakur |
| 18 | +****************************** |
| 19 | + |
| 20 | +I spend my last 3 months working on `GSoC project`_. My GSoC project was |
| 21 | +related with writing the bindings of the Hydra C++ library. Hydra is a header |
| 22 | +only C++ library designed and used to run on Linux platforms. Hydra is a |
| 23 | +templated C++11 library designed to perform common High Energy Physics data |
| 24 | +analyses on massively parallel platforms. The idea of this GSoC project is to |
| 25 | +provide the bindings of the Hydra library, so that the python support for |
| 26 | +Hydra library can be added and python can be used for the prototyping or |
| 27 | +development. |
| 28 | + |
| 29 | + |
| 30 | +.. _GSoC project: https://summerofcode.withgoogle.com/projects/#6669304945704960 |
| 31 | + |
| 32 | +My original proposal deliverables and my final output looks a little bit |
| 33 | +different and there are some very good reasons for it. The change of |
| 34 | +deliverables will become evident in the discussion of the design challenges |
| 35 | +and choices later in the report. In the beginning the goal was to write the |
| 36 | +bindings for the ``Data Fitting``, ``Random Number Generation``, |
| 37 | +``Phase-Space Monte Carlo Simulation``, ``Functor Arithmetic`` and |
| 38 | +``Numerical integration``, but we ended up having the bindings for |
| 39 | +``Random Number Generation`` and ``Phase-Space Monte Carlo Simulation`` only. |
| 40 | +(Though remaining classes can be binded with some extra efforts but we do |
| 41 | +not have time left under the current scope of GSoC, so I have decided to |
| 42 | +continue with the project outside the scope of GSoC.) |
| 43 | + |
| 44 | + |
| 45 | +Choosing proper tools |
| 46 | +********************* |
| 47 | + |
| 48 | +Let me take you to my 3 months journey. First step was to find a tool or |
| 49 | +package to write the bindings. Several options were in principle available to |
| 50 | +write the bindings for example in the beginning we tried to evaluate the |
| 51 | +`SWIG`_. |
| 52 | +But the problem with SWIG is, it is very complicated to use and second it |
| 53 | +does not support the ``variadic templates`` while Hydra underlying |
| 54 | +`Thrust library`_ depends heavily on variadic templates. After trying hands |
| 55 | +with SWIG and realizing it cannot fulfill our requirements, we turned our |
| 56 | +attention to `Boost.Python`_ which looks quite promising and a very large |
| 57 | +project but this large and complex suite project have so many tweaks and |
| 58 | +hacks so that it can work on almost any compiler but with added so many |
| 59 | +complexities and cost. Finally we turned our attention to use `pybind11`_. |
| 60 | +A quote taken from pybind11 documentation, |
| 61 | + |
| 62 | + Boost is an enormously large and complex suite of utility libraries |
| 63 | + that works with almost every C++ compiler in existence. This compatibility |
| 64 | + has its cost: arcane template tricks and workarounds are necessary to |
| 65 | + support the oldest and buggiest of compiler specimens. Now |
| 66 | + that C++11-compatible compilers are widely available, this heavy |
| 67 | + machinery has become an excessively large and unnecessary dependency. |
| 68 | + |
| 69 | +After investigating a lot of things and trying `various programs`_ we decided |
| 70 | +to go ahead with pybind11. Next step was to `familiarize myself`_ with pybind11. |
| 71 | + |
| 72 | +.. _SWIG: http://swig.org |
| 73 | +.. _Thrust library: https://github.com/andrewcorrigan/thrust-multi-permutation-iterator |
| 74 | +.. _Boost.Python: http://www.boost.org/doc/libs/1_65_0/libs/python/doc/html/index.html |
| 75 | +.. _pybind11: https://github.com/pybind/pybind11 |
| 76 | +.. _various programs: https://github.com/Deepanshu2017/boost.python_practise |
| 77 | +.. _familiarize myself: https://github.com/Deepanshu2017/pybind11_practise |
| 78 | + |
| 79 | + |
| 80 | +The Basic design problem |
| 81 | +************************ |
| 82 | + |
| 83 | +Now we needed to solve the basic design problem which is the `CRTP idiom`_. |
| 84 | +Hydra library relies on the CRTP idiom to avoid runtime overhead. I |
| 85 | +investigated a lot about CRTP and it took a little while to finally come up |
| 86 | +with a solution that can work with any number N. It means our class can accept |
| 87 | +any number of particles at final states. (denoted by N) If you know about |
| 88 | +CRTP, it is a type of static polymorphism or compile time polymorphism. The |
| 89 | +idea that I implemented was to take a parameter from python and based on that |
| 90 | +parameter, I was writing the bindings in a new file, compiling and generating |
| 91 | +them on runtime with system calls. Unfortunately generating bindings at |
| 92 | +runtime and compiling them would take a lot of time and so, it is not |
| 93 | +feasible for user to each time wait for few minutes before actually be |
| 94 | +able to use the generated package. We decided to go ahead with fixed number |
| 95 | +of values. Means we generate bindings for a limited number of particles. |
| 96 | +Currently python bindings for classes supports up to 10 (N = 10) number of |
| 97 | +particles at final state. We can make that to work with any number we want, |
| 98 | +as our binding code is written within a macro, so it is just a matter of |
| 99 | +writing additional 1 extra call to make it use with extra value of N. |
| 100 | + |
| 101 | +.. _CRTP idiom: https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern |
| 102 | + |
| 103 | + |
| 104 | +The Hydra Binding |
| 105 | +***************** |
| 106 | + |
| 107 | +Now that the approach was decided, we jump into the bindings of Hydra. |
| 108 | +(Finally after so many complications but unfortunately this was not the |
| 109 | +end of them.) We decided to bind the most important classes first, |
| 110 | +``Random Number Generation`` and ``Phase-Space Monte Carlo Simulation``. |
| 111 | +My mentors decided that they will bind the ``Random Number Generation`` while |
| 112 | +``Phase-Space Monte Carlo Simulation`` was my responsibility. Rest of the |
| 113 | +report will explain more about Phase-Space Monte Carlo Simulation. |
| 114 | + |
| 115 | +“Phase-Space Monte Carlo Simulation” or PhaseSpace C++ Hydra class is useful |
| 116 | +to generate the phase space monte carlo simulation. |
| 117 | + |
| 118 | + The events are generated in the center-of-mass frame, but the decay products |
| 119 | + are finally boosted using the betas of the original particle. The code is |
| 120 | + based on the Raubold and Lynch method as documentd in |
| 121 | + [F. James, Monte Carlo Phase Space, CERN 68-15 (1968)] |
| 122 | + (https://cds.cern/ch/record/275743). |
| 123 | + |
| 124 | +The Momentum and Energy units are GeV/C, GeV/C^2. The PhaseSpace monte |
| 125 | +carlo class depends on the ``Vector3R``, ``Vector4R`` and ``Events`` classes. |
| 126 | +Thus PhaseSpace class cannot be binded before without any of the above classes. |
| 127 | + |
| 128 | +The ``Vector3R`` and ``Vector4R`` classes were binded. There were some problems |
| 129 | +like generating ``__eq__`` and ``__nq__`` methods for python side but I solved |
| 130 | +them by creating ``lambda function`` and iterating over values and checking |
| 131 | +if they satisfy the conditions or not. The ``Vector4R`` or four-vector class |
| 132 | +represents a particle. The idea is I first bind the particles class |
| 133 | +(the four-vector class) than I had to bind the ``Events`` class that will |
| 134 | +hold the Phase Space generated by the ``PhaseSpace`` class, and then bind the |
| 135 | +actual ``PhaseSpace`` class. The ``Events`` class were not so easy to bind |
| 136 | +because they were dependent on the ``hydra::multiarray`` and without their |
| 137 | +bindings, the ``Events`` class was impossible to bind. Thanks to my mentor |
| 138 | +who had already binded these bindings for ``Random`` class with some tweaks on |
| 139 | +the pybind11’s bind_container itself. We even faced some design issues of |
| 140 | +Events class in Hydra itself. But eventually after solving these problems, |
| 141 | +I now had Events class working and I therefore converted the binding code |
| 142 | +into a macro, so that we can use Events class with up-to 10 particles. |
| 143 | + |
| 144 | +Now came the actual bindings for the ``PhaseSpace`` class. The ``PhaseSpace`` |
| 145 | +class have constructors and methods like ``GetSeed``, ``SetSeed``, ``AverageOn``, ``Evaluate`` and ``Generate``. |
| 146 | + |
| 147 | + |
| 148 | +The ``GetSeed`` and ``SetSeed`` were easy to implement. The remaining 3 methods |
| 149 | +have two version, one which accept single mother particle and one which accept |
| 150 | +a list of mother particle. I got the success of bindings methods which accept |
| 151 | +the single mother particle but was unable to bind the methods that accepts |
| 152 | +the list of mother particles. I was trying to pass the list of events object |
| 153 | +along with the list of mother particles. I was successfully able to pass the |
| 154 | +list of mother particles but wasn’t getting any way to pass the list of Events |
| 155 | +without casting each Event object from python object in my bindings code. |
| 156 | +(Later I realized that is impossible to do) My mentor wrote the bindings for |
| 157 | +methods that accept the list of mother particles. After looking at binding |
| 158 | +code I realized. Alas! I was making a very stupid mistake. I had to pass the |
| 159 | +``single Events object, not the list of Events object`` which I already did |
| 160 | +but never showed to my mentor, thought I’m making a mistake. Well learned a |
| 161 | +lesson from this, always show your mentor what you did, even though if you |
| 162 | +believe you are wrong. Maybe it could save some of your time. ;) |
| 163 | + |
| 164 | +After completing the PhaseSpace code, I quickly converted the code into macro |
| 165 | +for supporting up-to 10 particles. |
| 166 | + |
| 167 | +Now the PhaseSpace class was working perfectly! Next step was to create a |
| 168 | +series of test cases and documentation and of-course the example of |
| 169 | +PhaseSpace class in action. The remaining algorithms that I named at the |
| 170 | +start of the article are left to implement. |
| 171 | + |
| 172 | + |
| 173 | +The happy learning |
| 174 | +****************** |
| 175 | + |
| 176 | +GSoC 2017 was a really very learning experience for me. I learned a lot of |
| 177 | +things not only related with programming but related with high energy physics. |
| 178 | +I learned about *Monte Carlo Simulations*, and how they can be used to solve |
| 179 | +challenging real life problems. I read and studied a research paper |
| 180 | +( https://cds.cern.ch/record/275743/files/CERN-68-15.pdf ), learned about |
| 181 | +particle decays, learned the insights of C++ varidiac templates, |
| 182 | +wrote a blog about `CRTP`_, learned how to compile a |
| 183 | +python function and why simple python functions cannot be used in |
| 184 | +multithreaded environments. Most importantly I learned how to structure |
| 185 | +a project from scratch, how important documentation and test cases are. |
| 186 | + |
| 187 | + |
| 188 | +.. _CRTP: https://medium.com/@deepanshu2017/a-curiously-recurring-python-d3a441a58174 |
| 189 | + |
| 190 | + |
| 191 | +Special Thanks |
| 192 | +************** |
| 193 | + |
| 194 | +Shoutout to my amazing mentors. I would like to thank |
| 195 | +Dr. Antonio Augusto Alaves Jr. and Eduardo Rodrigues for being awesome |
| 196 | +mentors and for all the time they invested in me during GSoC. I also would |
| 197 | +like to thank the CERN-HSF community for their time and helping me whenever I |
| 198 | +had a problem. Thank you! |
0 commit comments