Skip to content

Commit da2f73a

Browse files
committed
Added project report
Signed-off-by: Deepanshu <[email protected]>
1 parent bc111da commit da2f73a

File tree

2 files changed

+179
-3
lines changed

2 files changed

+179
-3
lines changed

docs/intro.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
.. image:: hydra_logo.png
22
:scale: 25 %
3-
3+
44
About this project
55
==================
66
The **Hydra.Python** package provides the Python bindings for the header-only C++ `Hydra`_ library.
7-
This library is an abstraction over the C++ library, so that daily work can be code and run with the much simpler Python language,
7+
This library is an abstraction over the C++ library, so that daily work can be code and run with the Python language,
88
concentrating on the logic and leaving all the complex memory management and optimisations to the C++ library.
99

1010
The bindings are produced with `pybind11`_. The project makes use of `CMAKE`_.
@@ -41,4 +41,4 @@ History
4141
The development of **Hydra.Python** started as a
4242
2017 Google Summer of Code project (`GSoC`_) with student Deepanshu Thakur.
4343

44-
.. _GSoC: https://summerofcode.withgoogle.com/projects/#6669304945704960
44+
.. _GSoC: https://summerofcode.withgoogle.com/projects/#6669304945704960

project_report.rst

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
###############
2+
Project Report:
3+
###############
4+
5+
***************************************************************
6+
Google Summer of Code 2017
7+
***************************************************************
8+
9+
===============================================================
10+
Umbrella Organization: CERN-HSF, CERN’s HEP software foundation
11+
===============================================================
12+
13+
================================================================================================================================
14+
Project: Efficient Python routines for analysis on massively multi-threaded platforms-Python bindings for the Hydra C++ library
15+
================================================================================================================================
16+
17+
Submitted by- Deepanshu Thakur
18+
******************************
19+
20+
I spend my last 3 months working on `GSoC project`_. My GSoC project was
21+
related with writing the bindings of the Hydra C++ library. Hydra is a header
22+
only C++ library designed and used to run on Linux platforms. Hydra is a
23+
templated C++11 library designed to perform common High Energy Physics data
24+
analyses on massively parallel platforms. The idea of this GSoC project is to
25+
provide the bindings of the Hydra library, so that the python support for
26+
Hydra library can be added and python can be used for the prototyping or
27+
development.
28+
29+
30+
.. _GSoC project: https://summerofcode.withgoogle.com/projects/#6669304945704960
31+
32+
My original proposal deliverables and my final output looks a little bit
33+
different and there are some very good reasons for it. The change of
34+
deliverables will become evident in the discussion of the design challenges
35+
and choices later in the report. In the beginning the goal was to write the
36+
bindings for the ``Data Fitting``, ``Random Number Generation``,
37+
``Phase-Space Monte Carlo Simulation``, ``Functor Arithmetic`` and
38+
``Numerical integration``, but we ended up having the bindings for
39+
``Random Number Generation`` and ``Phase-Space Monte Carlo Simulation`` only.
40+
(Though remaining classes can be binded with some extra efforts but we do
41+
not have time left under the current scope of GSoC, so I have decided to
42+
continue with the project outside the scope of GSoC.)
43+
44+
Let me take you to my 3 months journey. First step was to find a tool or
45+
package to write the bindings. Several options were in principle available to
46+
write the bindings for example in the beginning we tried to evaluate the
47+
`SWIG`_.
48+
But the problem with SWIG is, it is very complicated to use and second it
49+
does not support the ``variadic templates`` while Hydra underlying
50+
`Thrust library`_ depends heavily on variadic templates. After trying hands
51+
with SWIG and realizing it cannot fulfill our requirements, we turned our
52+
attention to `Boost.Python`_ which looks quite promising and a very large
53+
project but this large and complex suite project have so many tweaks and
54+
hacks so that it can work on almost any compiler but with added so many
55+
complexities and cost. Finally we turned our attention to use `pybind11`_.
56+
A quote taken from pybind11 documentation,
57+
58+
Boost is an enormously large and complex suite of utility libraries
59+
that works with almost every C++ compiler in existence. This compatibility
60+
has its cost: arcane template tricks and workarounds are necessary to
61+
support the oldest and buggiest of compiler specimens. Now
62+
that C++11-compatible compilers are widely available, this heavy
63+
machinery has become an excessively large and unnecessary dependency.
64+
65+
After investigating a lot of things and trying `various programs`_ we decided
66+
to go ahead with pybind11. Next step was to `familiarize myself`_ with pybind11.
67+
68+
.. _SWIG: http://swig.org
69+
.. _Thrust library: https://github.com/andrewcorrigan/thrust-multi-permutation-iterator
70+
.. _Boost.Python: http://www.boost.org/doc/libs/1_65_0/libs/python/doc/html/index.html
71+
.. _pybind11: https://github.com/pybind/pybind11
72+
.. _various programs: https://github.com/Deepanshu2017/boost.python_practise
73+
.. _familiarize myself: https://github.com/Deepanshu2017/pybind11_practise
74+
75+
76+
Now we needed to solve the basic design problem which is the `CRTP idiom`_.
77+
Hydra library relies on the CRTP idiom to avoid runtime overhead. I
78+
investigated a lot about CRTP and it took a little while to finally come up
79+
with a solution that can work with any number N. It means our class can accept
80+
any number of particles at final states. (denoted by N) If you know about
81+
CRTP, it is a type of static polymorphism or compile time polymorphism. The
82+
idea that I implemented was to take a parameter from python and based on that
83+
parameter, I was writing the bindings in a new file, compiling and generating
84+
them on runtime with system calls. Unfortunately generating bindings at
85+
runtime and compiling them would take a lot of time and so, it is not
86+
feasible for user to each time wait for few minutes before actually be
87+
able to use the generated package. We decided to go ahead with fixed number
88+
of values. Means we generate bindings for a limited number of particles.
89+
Currently python bindings for classes supports up to 10 (N = 10) number of
90+
particles at final state. We can make that to work with any number we want,
91+
as our binding code is written within a macro, so it is just a matter of
92+
writing additional 1 extra call to make it use with extra value of N.
93+
94+
.. _CRTP idiom: https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern
95+
96+
Now that the approach was decided, we jump into the bindings of Hydra.
97+
(Finally after so many complications but unfortunately this was not the
98+
end of them.) We decided to bind the most important classes first,
99+
``Random Number Generation`` and ``Phase-Space Monte Carlo Simulation``.
100+
My mentors decided that they will bind the ``Random Number Generation`` while
101+
``Phase-Space Monte Carlo Simulation`` was my responsibility. Rest of the
102+
report will explain more about Phase-Space Monte Carlo Simulation.
103+
104+
“Phase-Space Monte Carlo Simulation” or PhaseSpace C++ Hydra class is useful
105+
to generate the phase space monte carlo simulation.
106+
107+
The events are generated in the center-of-mass frame, but the decay products
108+
are finally boosted using the betas of the original particle. The code is
109+
based on the Raubold and Lynch method as documentd in
110+
[F. James, Monte Carlo Phase Space, CERN 68-15 (1968)]
111+
(https://cds.cern/ch/record/275743).
112+
113+
The Momentum and Energy units are GeV/C, GeV/C^2. The PhaseSpace monte
114+
carlo class depends on the ``Vector3R``, ``Vector4R`` and ``Events`` classes.
115+
Thus PhaseSpace class cannot be binded before without any of the above classes.
116+
117+
The ``Vector3R`` and ``Vector4R`` classes were binded. There were some problems
118+
like generating ``__eq__`` and ``__nq__`` methods for python side but I solved
119+
them by creating ``lambda function`` and iterating over values and checking
120+
if they satisfy the conditions or not. The ``Vector4R`` or four-vector class
121+
represents a particle. The idea is I first bind the particles class
122+
(the four-vector class) than I had to bind the ``Events`` class that will
123+
hold the Phase Space generated by the ``PhaseSpace`` class, and then bind the
124+
actual ``PhaseSpace`` class. The ``Events`` class were not so easy to bind
125+
because they were dependent on the ``hydra::multiarray`` and without their
126+
bindings, the ``Events`` class was impossible to bind. Thanks to my mentor
127+
who had already binded these bindings for ``Random`` class with some tweaks on
128+
the pybind11’s bind_container itself. We even faced some design issues of
129+
Events class in Hydra itself. But eventually after solving these problems,
130+
I now had Events class working and I therefore converted the binding code
131+
into a macro, so that we can use Events class with up-to 10 particles.
132+
133+
Now came the actual bindings for the ``PhaseSpace`` class. The ``PhaseSpace``
134+
class have constructors and methods like ``GetSeed``, ``SetSeed``, ``AverageOn``, ``Evaluate`` and ``Generate``.
135+
136+
137+
The ``GetSeed`` and ``SetSeed`` were easy to implement. The remaining 3 methods
138+
have two version, one which accept single mother particle and one which accept
139+
a list of mother particle. I got the success of bindings methods which accept
140+
the single mother particle but was unable to bind the methods that accepts
141+
the list of mother particles. I was trying to pass the list of events object
142+
along with the list of mother particles. I was successfully able to pass the
143+
list of mother particles but wasn’t getting any way to pass the list of Events
144+
without casting each Event object from python object in my bindings code.
145+
(Later I realized that is impossible to do) My mentor wrote the bindings for
146+
methods that accept the list of mother particles. After looking at binding
147+
code I realized. Alas! I was making a very stupid mistake. I had to pass the
148+
``single Events object, not the list of Events object`` which I already did
149+
but never showed to my mentor, thought I’m making a mistake. Well learned a
150+
lesson from this, always show your mentor what you did, even though if you
151+
believe you are wrong. Maybe it could save some of your time. ;)
152+
153+
After completing the PhaseSpace code, I quickly converted the code into macro
154+
for supporting up-to 10 particles.
155+
156+
Now the PhaseSpace class was working perfectly! Next step was to create a
157+
series of test cases and documentation and of-course the example of
158+
PhaseSpace class in action. The remaining algorithms that I named at the
159+
start of the article are left to implement.
160+
161+
GSoC 2017 was a really very learning experience for me. I learned a lot of
162+
things not only related with programming but related with high energy physics.
163+
I learned about *Monte Carlo Simulations*, and how they can be used to solve
164+
challenging real life problems. I read and studied a research paper
165+
( https://cds.cern.ch/record/275743/files/CERN-68-15.pdf ), learned about
166+
particle decays, learned the insights of C++ varidiac templates,
167+
wrote a blog about CRTP ( #TODO insert blog link), learned how to compile a
168+
python function and why simple python functions cannot be used in
169+
multithreaded environments. Most importantly I learned how to structure
170+
a project from scratch, how important documentation and test cases are.
171+
172+
Shoutout to my amazing mentors. I would like to thank
173+
Dr. Antonio Augusto Alaves Jr. and Eduardo Rodrigues for being awesome
174+
mentors and for all the time they invested in me during GSoC. I also would
175+
like to thank the CERN-HSF community for their time and helping me whenever I
176+
had a problem. Thank you!

0 commit comments

Comments
 (0)