You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a user calls any :py:class:`~modin.pandas.dataframe.DataFrame` API, a query starts forming at the `API` layer
17
+
to be executed at the `Execution` layer. The `API` layer is responsible for processing the query appropriately,
18
+
for example, determining whether the final result should be a ``DataFrame`` or ``Series`` object. This layer is also responsible for sanitizing the input to the
19
+
:py:class:`~modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler`, e.g. validating a parameter from the query
20
+
and defining specific intermediate values to provide more context to the query compiler.
21
+
The :py:class:`~modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler` is responsible for
22
+
processing the query, received from the :py:class:`~modin.pandas.dataframe.DataFrame` `API` layer,
23
+
to determine how to apply it to a subset of the data - either cell-wise or along an axis-wise partition backed by the `pandas`
24
+
storage format. The :py:class:`~modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler` maps the query to one of the :doc:`Core Algebra Operators </flow/modin/core/dataframe/algebra>` of
25
+
the :py:class:`~modin.core.execution.python.implementations.pandas_on_python.dataframe.dataframe.PandasOnPythonDataframe` which inherits
26
+
generic functionality from the :py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe`.
27
+
28
+
1
29
PandasOnPython Dataframe implementation
2
-
=======================================
30
+
---------------------------------------
3
31
4
32
This page describes implementation of :doc:`Modin PandasDataframe Objects </flow/modin/core/dataframe/pandas/index>`
5
33
specific for `PandasOnPython` execution. Since Python engine doesn't allow computation parallelization,
@@ -17,4 +45,32 @@ perfomance speed-up, so ``PandasOnPython`` is used for testing purposes only.
17
45
dataframe
18
46
partitioning/partition
19
47
partitioning/axis_partition
20
-
partitioning/partition_manager
48
+
partitioning/partition_manager
49
+
50
+
51
+
Data Ingress
52
+
''''''''''''
53
+
54
+
.. image:: /img/pandas_on_python_data_ingress.svg
55
+
:align:center
56
+
57
+
Data Egress
58
+
'''''''''''
59
+
60
+
.. image:: /img/pandas_on_python_data_egress.svg
61
+
:align:center
62
+
63
+
64
+
When a user calls any IO function from the ``modin.pandas.io`` module, the `API` layer queries the
65
+
:py:class:`~modin.core.execution.dispatching.factories.dispatcher.FactoryDispatcher` which defines a factory specific for
66
+
the execution, namely, the :py:class:`~modin.core.execution.dispatching.factories.factories.PandasOnPythonFactory`. The factory, in turn,
67
+
exposes the :py:class:`~modin.core.execution.python.implementations.pandas_on_python.io.PandasOnPythonIO` class
68
+
whose responsibility is a read/write from/to a file.
69
+
70
+
When reading data from a CSV file, for example, the :py:class:`~modin.core.execution.python.implementations.pandas_on_python.io.io.PandasOnPythonIO` class
71
+
reads the data using corresponding `pandas` function (``pandas.read_csv()`` in this case). After the reading is complete, a new query compiler is created from `pandas` object
72
+
using :py:meth:`~modin.core.execution.python.implementations.pandas_on_python.io.io.PandasOnPythonIO.from_pandas` and returned.
73
+
74
+
When writing data to a CSV file, for example, the :py:class:`~modin.core.execution.python.implementations.pandas_on_python.io.PandasOnPythonIO` converts a query compiler
75
+
to `pandas` object using :py:meth:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler.to_pandas`. After that, `pandas` writes the data to the file using
76
+
corresponding function (``pandas.to_csv()`` in this case).
0 commit comments