Skip to content

Commit d590de0

Browse files
authored
DOCS-#3719: Improve documentation for pandas_on_python execution (#3775)
Signed-off-by: Alexey Prutskov <[email protected]>
1 parent cd8db0c commit d590de0

File tree

4 files changed

+70
-2
lines changed

4 files changed

+70
-2
lines changed

docs/flow/modin/core/execution/python/implementations/pandas_on_python/index.rst

Lines changed: 58 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,33 @@
1+
PandasOnPython Execution
2+
========================
3+
4+
Queries that perform data transformation, data ingress or data egress using the `pandas on Python` execution
5+
pass through the Modin components detailed below.
6+
7+
`pandas on Python` execution is sequential and it's used for the debug purposes. To enable `pandas on Python` execution,
8+
please refer to the usage section in :doc:`pandas on Python </UsingPandasonPython/index>`.
9+
10+
Data Transformation
11+
'''''''''''''''''''
12+
13+
.. image:: /img/pandas_on_python_data_transform.svg
14+
:align: center
15+
16+
When a user calls any :py:class:`~modin.pandas.dataframe.DataFrame` API, a query starts forming at the `API` layer
17+
to be executed at the `Execution` layer. The `API` layer is responsible for processing the query appropriately,
18+
for example, determining whether the final result should be a ``DataFrame`` or ``Series`` object. This layer is also responsible for sanitizing the input to the
19+
:py:class:`~modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler`, e.g. validating a parameter from the query
20+
and defining specific intermediate values to provide more context to the query compiler.
21+
The :py:class:`~modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler` is responsible for
22+
processing the query, received from the :py:class:`~modin.pandas.dataframe.DataFrame` `API` layer,
23+
to determine how to apply it to a subset of the data - either cell-wise or along an axis-wise partition backed by the `pandas`
24+
storage format. The :py:class:`~modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler` maps the query to one of the :doc:`Core Algebra Operators </flow/modin/core/dataframe/algebra>` of
25+
the :py:class:`~modin.core.execution.python.implementations.pandas_on_python.dataframe.dataframe.PandasOnPythonDataframe` which inherits
26+
generic functionality from the :py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe`.
27+
28+
129
PandasOnPython Dataframe implementation
2-
=======================================
30+
---------------------------------------
331

432
This page describes implementation of :doc:`Modin PandasDataframe Objects </flow/modin/core/dataframe/pandas/index>`
533
specific for `PandasOnPython` execution. Since Python engine doesn't allow computation parallelization,
@@ -17,4 +45,32 @@ perfomance speed-up, so ``PandasOnPython`` is used for testing purposes only.
1745
dataframe
1846
partitioning/partition
1947
partitioning/axis_partition
20-
partitioning/partition_manager
48+
partitioning/partition_manager
49+
50+
51+
Data Ingress
52+
''''''''''''
53+
54+
.. image:: /img/pandas_on_python_data_ingress.svg
55+
:align: center
56+
57+
Data Egress
58+
'''''''''''
59+
60+
.. image:: /img/pandas_on_python_data_egress.svg
61+
:align: center
62+
63+
64+
When a user calls any IO function from the ``modin.pandas.io`` module, the `API` layer queries the
65+
:py:class:`~modin.core.execution.dispatching.factories.dispatcher.FactoryDispatcher` which defines a factory specific for
66+
the execution, namely, the :py:class:`~modin.core.execution.dispatching.factories.factories.PandasOnPythonFactory`. The factory, in turn,
67+
exposes the :py:class:`~modin.core.execution.python.implementations.pandas_on_python.io.PandasOnPythonIO` class
68+
whose responsibility is a read/write from/to a file.
69+
70+
When reading data from a CSV file, for example, the :py:class:`~modin.core.execution.python.implementations.pandas_on_python.io.io.PandasOnPythonIO` class
71+
reads the data using corresponding `pandas` function (``pandas.read_csv()`` in this case). After the reading is complete, a new query compiler is created from `pandas` object
72+
using :py:meth:`~modin.core.execution.python.implementations.pandas_on_python.io.io.PandasOnPythonIO.from_pandas` and returned.
73+
74+
When writing data to a CSV file, for example, the :py:class:`~modin.core.execution.python.implementations.pandas_on_python.io.PandasOnPythonIO` converts a query compiler
75+
to `pandas` object using :py:meth:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler.to_pandas`. After that, `pandas` writes the data to the file using
76+
corresponding function (``pandas.to_csv()`` in this case).
Lines changed: 4 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)