Skip to content

Commit 628c7fb

Browse files
WEB: Remove Roadmap points pending a PDEP section from Roadmap (#61892)
1 parent 68644ac commit 628c7fb

File tree

1 file changed

+0
-140
lines changed

1 file changed

+0
-140
lines changed

web/pandas/about/roadmap.md

Lines changed: 0 additions & 140 deletions
Original file line numberDiff line numberDiff line change
@@ -34,143 +34,3 @@ For more information about PDEPs, and how to submit one, please refer to
3434
</ul>
3535

3636
{% endfor %}
37-
38-
## Roadmap points pending a PDEP
39-
40-
<div class="alert alert-warning" role="alert">
41-
pandas is in the process of moving roadmap points to PDEPs (implemented in
42-
August 2022). During the transition, some roadmap points will exist as PDEPs,
43-
while others will exist as sections below.
44-
</div>
45-
46-
### Extensibility
47-
48-
Pandas `extending.extension-types` allow
49-
for extending NumPy types with custom data types and array storage.
50-
Pandas uses extension types internally, and provides an interface for
51-
3rd-party libraries to define their own custom data types.
52-
53-
Many parts of pandas still unintentionally convert data to a NumPy
54-
array. These problems are especially pronounced for nested data.
55-
56-
We'd like to improve the handling of extension arrays throughout the
57-
library, making their behavior more consistent with the handling of
58-
NumPy arrays. We'll do this by cleaning up pandas' internals and
59-
adding new methods to the extension array interface.
60-
61-
### Apache Arrow interoperability
62-
63-
[Apache Arrow](https://arrow.apache.org) is a cross-language development
64-
platform for in-memory data. The Arrow logical types are closely aligned
65-
with typical pandas use cases.
66-
67-
We'd like to provide better-integrated support for Arrow memory and
68-
data types within pandas. This will let us take advantage of its I/O
69-
capabilities and provide for better interoperability with other
70-
languages and libraries using Arrow.
71-
72-
### Decoupling of indexing and internals
73-
74-
The code for getting and setting values in pandas' data structures
75-
needs refactoring. In particular, we must clearly separate code that
76-
converts keys (e.g., the argument to `DataFrame.loc`) to positions from
77-
code that uses these positions to get or set values. This is related to
78-
the proposed BlockManager rewrite. Currently, the BlockManager sometimes
79-
uses label-based, rather than position-based, indexing. We propose that
80-
it should only work with positional indexing, and the translation of
81-
keys to positions should be entirely done at a higher level.
82-
83-
Indexing is a complicated API with many subtleties. This refactor will require care
84-
and attention. The following principles should inspire refactoring of indexing code and
85-
should result on cleaner, simpler, and more performant code.
86-
87-
1. Label indexing must never involve looking in an axis twice for the same label(s).
88-
This implies that any validation step must either:
89-
90-
* limit validation to general features (e.g. dtype/structure of the key/index), or
91-
* reuse the result for the actual indexing.
92-
93-
2. Indexers must never rely on an explicit call to other indexers.
94-
For instance, it is OK to have some internal method of `.loc` call some
95-
internal method of `__getitem__` (or of their common base class),
96-
but never in the code flow of `.loc` should `the_obj[something]` appear.
97-
98-
3. Execution of positional indexing must never involve labels (as currently, sadly, happens).
99-
That is, the code flow of a getter call (or a setter call in which the right hand side is non-indexed)
100-
to `.iloc` should never involve the axes of the object in any way.
101-
102-
4. Indexing must never involve accessing/modifying values (i.e., act on `._data` or `.values`) more than once.
103-
The following steps must hence be clearly decoupled:
104-
105-
* find positions we need to access/modify on each axis
106-
* (if we are accessing) derive the type of object we need to return (dimensionality)
107-
* actually access/modify the values
108-
* (if we are accessing) construct the return object
109-
110-
5. As a corollary to the decoupling between 4.i and 4.iii, any code which deals on how data is stored
111-
(including any combination of handling multiple dtypes, and sparse storage, categoricals, third-party types)
112-
must be independent from code that deals with identifying affected rows/columns,
113-
and take place only once step 4.i is completed.
114-
115-
* In particular, such code should most probably not live in `pandas/core/indexing.py`
116-
* ... and must not depend in any way on the type(s) of axes (e.g. no `MultiIndex` special cases)
117-
118-
6. As a corollary to point 1.i, `Index` (sub)classes must provide separate methods for any desired validity check of label(s) which does not involve actual lookup,
119-
on the one side, and for any required conversion/adaptation/lookup of label(s), on the other.
120-
121-
7. Use of trial and error should be limited, and anyway restricted to catch only exceptions
122-
which are actually expected (typically `KeyError`).
123-
124-
* In particular, code should never (intentionally) raise new exceptions in the `except` portion of a `try... exception`
125-
126-
8. Any code portion which is not specific to setters and getters must be shared,
127-
and when small differences in behavior are expected (e.g. getting with `.loc` raises for
128-
missing labels, setting still doesn't), they can be managed with a specific parameter.
129-
130-
### Numba-accelerated operations
131-
132-
[Numba](https://numba.pydata.org) is a JIT compiler for Python code.
133-
We'd like to provide ways for users to apply their own Numba-jitted
134-
functions where pandas accepts user-defined functions (for example,
135-
`Series.apply`,
136-
`DataFrame.apply`,
137-
`DataFrame.applymap`, and in groupby and
138-
window contexts). This will improve the performance of
139-
user-defined-functions in these operations by staying within compiled
140-
code.
141-
142-
### Documentation improvements
143-
144-
We'd like to improve the content, structure, and presentation of the
145-
pandas documentation. Some specific goals include
146-
147-
- Overhaul the HTML theme with a modern, responsive design
148-
(`15556`)
149-
- Improve the "Getting Started" documentation, designing and writing
150-
learning paths for users different backgrounds (e.g. brand new to
151-
programming, familiar with other languages like R, already familiar
152-
with Python).
153-
- Improve the overall organization of the documentation and specific
154-
subsections of the documentation to make navigation and finding
155-
content easier.
156-
157-
### Performance monitoring
158-
159-
Pandas uses [airspeed velocity](https://asv.readthedocs.io/en/stable/)
160-
to monitor for performance regressions. ASV itself is a fabulous tool,
161-
but requires some additional work to be integrated into an open source
162-
project's workflow.
163-
164-
The [asv-runner](https://github.com/asv-runner) organization, currently
165-
made up of pandas maintainers, provides tools built on top of ASV. We
166-
have a physical machine for running a number of project's benchmarks,
167-
and tools managing the benchmark runs and reporting on results.
168-
169-
We'd like to fund improvements and maintenance of these tools to
170-
171-
- Be more stable. Currently, they're maintained on the nights and
172-
weekends when a maintainer has free time.
173-
- Tune the system for benchmarks to improve stability, following
174-
<https://pyperf.readthedocs.io/en/latest/system.html>
175-
- Build a GitHub bot to request ASV runs *before* a PR is merged.
176-
Currently, the benchmarks are only run nightly.

0 commit comments

Comments
 (0)