Skip to content

Commit 5de3acf

Browse files
GordonGordon
authored andcommitted
Issue 22: Add fix to markdown for html rendering.
1 parent 64ac1fb commit 5de3acf

File tree

1 file changed

+17
-17
lines changed

1 file changed

+17
-17
lines changed

affinity/cpp-23/p1437r0.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
# Acknowledgements
1414

15-
This paper is the result of discussions from man contributors within the heterogeneous C++ group, including H. Carter Edwards, Thomas Rodgers, Patrice Roy, Carl Cook, Jeff Hammond, Hartmut Kaiser, Christian Trott, Paul Blinzer, Alex Voicu, Nat Goodspeed and Tony Tye.
15+
This paper is the result of discussions from man contributors within the heterogeneous C\+\+ group, including H. Carter Edwards, Thomas Rodgers, Patrice Roy, Carl Cook, Jeff Hammond, Hartmut Kaiser, Christian Trott, Paul Blinzer, Alex Voicu, Nat Goodspeed and Tony Tye.
1616

1717
# Changelog
1818

@@ -36,47 +36,47 @@ Computer systems are no longer homogeneous platforms. From desktop workstations
3636

3737
The way these processors access memory is also far from homogeneous. For example, the system may present a single shared virtual address space [[21]][hmm] [[22]][opencl-svm], or it may have different address spaces mutually inaccessible other than through special functions [[4]][opencl-2-2]. Different memory regions may have different levels of consistency, cache coherency, and support for atomic operations. Different parts of the system may have different access latencies or bandwidths to different memory regions (so-called "NUMA affinity regions") [[2]][hwloc]. Some parts of memory may be persistent. Different systems may configure the same types of memory in different ways around the processors.
3838

39-
In order to program these new systems and the architectures that inhabit them, it's vital that applications are capable of understating both what architectures are available and the properties of those architectures, namely their observable behaviors, capabilities and limitations. However, the current C++ standard provides no way to achieve this, so developers have to rely entirely on third party and operating system libraries.
39+
In order to program these new systems and the architectures that inhabit them, it's vital that applications are capable of understating both what architectures are available and the properties of those architectures, namely their observable behaviors, capabilities and limitations. However, the current C\+\+ standard provides no way to achieve this, so developers have to rely entirely on third party and operating system libraries.
4040

4141
# Goals: what this paper is, and what it is not
4242

43-
This paper seeks to define, within C++, a facility for discovering execution resources available to a system that are capable of executing work, and for querying their properties.
43+
This paper seeks to define, within C\+\+, a facility for discovering execution resources available to a system that are capable of executing work, and for querying their properties.
4444

45-
However, it is not the goal of this proposal to introduce support in the C++ language or the standard library for all of the various heterogeneous architectures available today. The authors of this paper recognize that this is unrealistic as it would require significant changes to the C++ machine model and would be extremely volatile to future developments in architecture and system design.
45+
However, it is not the goal of this proposal to introduce support in the C\+\+ language or the standard library for all of the various heterogeneous architectures available today. The authors of this paper recognize that this is unrealistic as it would require significant changes to the C\+\+ machine model and would be extremely volatile to future developments in architecture and system design.
4646

47-
Instead, it seeks to define a single, unified, and stable layer in the C++ Standard Library. Applications, libraries, and programming models (such as SYCL [[3]][sycl-1-2-1], Kokkos [[19]][kokkos], HPX [[13]][hpx] or TBB [[12]][tbb]) can build on this layer; hardware vendors can support it via standards such as OpenCL [[4]][opencl-2-2], CUDA [[20]][cuda], OpenMP [[6]][openmp-5], MPI [[16]][mpi], Hwloc [[2]][hwloc], HSA [[5]][HSA] and HMM [[21]][hmm]; and it can be extended when necessary.
47+
Instead, it seeks to define a single, unified, and stable layer in the C\+\+ Standard Library. Applications, libraries, and programming models (such as SYCL [[3]][sycl-1-2-1], Kokkos [[19]][kokkos], HPX [[13]][hpx] or TBB [[12]][tbb]) can build on this layer; hardware vendors can support it via standards such as OpenCL [[4]][opencl-2-2], CUDA [[20]][cuda], OpenMP [[6]][openmp-5], MPI [[16]][mpi], Hwloc [[2]][hwloc], HSA [[5]][HSA] and HMM [[21]][hmm]; and it can be extended when necessary.
4848

49-
This layer will not be characterized in terms of specific categories of hardware such as CPUs, GPUs and FPGAs as these are broad concepts that are subject to change over time and have no foundation in the C++ machine model. It will instead define a number of abstract properties of system architectures that are not tied to any specific hardward.
49+
This layer will not be characterized in terms of specific categories of hardware such as CPUs, GPUs and FPGAs as these are broad concepts that are subject to change over time and have no foundation in the C\+\+ machine model. It will instead define a number of abstract properties of system architectures that are not tied to any specific hardward.
5050

51-
The initial set of properties that this paper would propose be defined in the C++ standard library would reflect a generalization of the observable behaviors, capabilities and limitations of common architectures available in heterogeneous and distributed systems today. However the intention is that the interface be extensible so that that vendors can provide their own extensions to provide visibility into the more niche characteristics of certain architectures.
51+
The initial set of properties that this paper would propose be defined in the C\+\+ standard library would reflect a generalization of the observable behaviors, capabilities and limitations of common architectures available in heterogeneous and distributed systems today. However the intention is that the interface be extensible so that that vendors can provide their own extensions to provide visibility into the more niche characteristics of certain architectures.
5252

5353
It is intended that this layer be defined as a natural extension of the Executors proposal, a unified interface for execution. The current executors proposal [[14]][p0443] already provides a route to supporting heterogeneous and distributed systems, however it is missing a way to identify what architectures a system has.
5454

5555
# Motivation
5656

57-
There are many reasons why such a feature within C++ would benefit developers and the C++ ecosystem as a whole, and those can differ from one domain to another. We've attempted to outline some of these benefits here.
57+
There are many reasons why such a feature within C\+\+ would benefit developers and the C\+\+ ecosystem as a whole, and those can differ from one domain to another. We've attempted to outline some of these benefits here.
5858

5959
## Improve performance
6060

6161
The clearest benefit is performance. Exposing, even at an abstract level, the properties of the underlying architecture that a program is running on, allows application and libraries to be fine tuned. This may result in significant performance improvements that would only otherwise be possible via third party or operating system libraries [[1]][design-of-openmp] [[7]][cpuaff] [[8]][memkid] [[9]][solaris-pbind] [[10]][linux-sched-setaffinity] [[11]][windows-set-thread-affinity-mask] [[15]][exposing-locality].
6262

6363
This includes but is not limited to how to structure data to ensure access patterns along with execution on the architecture to achieve coalesced memory access and optimal cache utilization and where to initialize data to make efficient use of hardware locality and process affinity.
6464

65-
There is a general trend to move towards a unified address space in heterogeneous and distributed systems via standards like HMM. However, there are still many architectures that still require distinct address spaces, are not yet in a position to move to a single address space, and may never be. Even if you were to consider a single unified address the ultimate goal for heterogeneous and distributed systems, this actually makes the case for affinity in C++ stronger. As long as different address spaces exist, the distinction between different hardware memory regions and their capabilities is clear, but with a single unified address space, potentially with cache coherency, distinguishing different memory regions becomes much more subtle. Therefore, it becomes much more important to understand the various memory regions and their affinity relationships in order to achieve good performance on various architectures.
65+
There is a general trend to move towards a unified address space in heterogeneous and distributed systems via standards like HMM. However, there are still many architectures that still require distinct address spaces, are not yet in a position to move to a single address space, and may never be. Even if you were to consider a single unified address the ultimate goal for heterogeneous and distributed systems, this actually makes the case for affinity in C\+\+ stronger. As long as different address spaces exist, the distinction between different hardware memory regions and their capabilities is clear, but with a single unified address space, potentially with cache coherency, distinguishing different memory regions becomes much more subtle. Therefore, it becomes much more important to understand the various memory regions and their affinity relationships in order to achieve good performance on various architectures.
6666

6767
## Provide a unified interface
6868

69-
C++ is a major language when it comes to heterogeneous and distributed computing, and while it is a rapidly growing domain, it is still very challenging to develop in. There are a large number of C++ based third party and OS libraries. However, developing for heterogeneous and distributed systems often involves a combination of these libraries, which introduces a number of challenges
69+
C\+\+ is a major language when it comes to heterogeneous and distributed computing, and while it is a rapidly growing domain, it is still very challenging to develop in. There are a large number of C\+\+ based third party and OS libraries. However, developing for heterogeneous and distributed systems often involves a combination of these libraries, which introduces a number of challenges
7070

7171
Firstly it's common that different architectures are discovered via different libraries, You may want to use CUDA for NVidia GPUs, OpenMP for Intel CPUs, SYCL for Intel GPUs, Hwloc for the higher-level nodes and so on. This means that you have to collect together resources discovered from a different libraries, which very often do not provide a consistent representation or any form of interoperability, and find some way for them to represent them in a coherent view.
7272

7373
Secondly, many of these libraries report the same underlying hardware. For example OpenMP, SYCL and Hwloc will all report the same Intel CPU. This means you have to collate the resources together such that resources from different libraries representing the same hardware are joined together, to avoid resource contention.
7474

7575
## Categorize limitations
7676

77-
There are many architectures available within heterogeneous and distributed systems which cannot support the full range of C++ features. This includes, but is not limited to dynamic allocation, recursion, dynamic polymorphism, RTTI, double precision floating point and some forms of atomic operations.
77+
There are many architectures available within heterogeneous and distributed systems which cannot support the full range of C\+\+ features. This includes, but is not limited to dynamic allocation, recursion, dynamic polymorphism, RTTI, double precision floating point and some forms of atomic operations.
7878

79-
It's crucial to allow developers to identify these limitations and which apply to the architecture they are running on, because in many cases if a C++ feature that is not supported on the architecture is used, the application would fail to execute or potentially crash.
79+
It's crucial to allow developers to identify these limitations and which apply to the architecture they are running on, because in many cases if a C\+\+ feature that is not supported on the architecture is used, the application would fail to execute or potentially crash.
8080

8181
## Facilitate generic code
8282

@@ -88,15 +88,15 @@ Having a unified interface for performing this topology discovery and querying t
8888

8989
## Increase accessibility
9090

91-
Providing support for heterogenous and distributed computing as a first-class citizen of C++ will improve its accessibility and increase its utilization in libraries and applications, ultimately making the ecosystem stronger. This will become increasingly more important as heterogeneous and distributed computing becomes crucial to gaining the necessary performance in applications in more domains of C++.
91+
Providing support for heterogenous and distributed computing as a first-class citizen of C\+\+ will improve its accessibility and increase its utilization in libraries and applications, ultimately making the ecosystem stronger. This will become increasingly more important as heterogeneous and distributed computing becomes crucial to gaining the necessary performance in applications in more domains of C\+\+.
9292

9393
## Provide a broader standardization
9494

95-
The C++ standard is in a crucial position for heterogeneous and distributed computing domains. It is the common point between a number of different programming languages, models and libraries targeting a wide range of different architectures. This means that C++ has a unique opportunity to provide a single standard that not only covers the requirements of a single domain, but all of them, allowing for a convergence within the ecosystem and much more interoperability across different architectures.
95+
The C\+\+ standard is in a crucial position for heterogeneous and distributed computing domains. It is the common point between a number of different programming languages, models and libraries targeting a wide range of different architectures. This means that C\+\+ has a unique opportunity to provide a single standard that not only covers the requirements of a single domain, but all of them, allowing for a convergence within the ecosystem and much more interoperability across different architectures.
9696

97-
For example, a unified C++ interface for topology discovery could provide access to GPUs from Nvidia, AMD, Intel, and ARM via their respective open standards or proprietary frameworks. At the same time, it could give access to NUMA-aware systems via Hwloc.
97+
For example, a unified C\+\+ interface for topology discovery could provide access to GPUs from Nvidia, AMD, Intel, and ARM via their respective open standards or proprietary frameworks. At the same time, it could give access to NUMA-aware systems via Hwloc.
9898

99-
Another example of this is that while Hwloc is highly used in many domains, it now does not always accurately represent existing systems. This is because Hwloc presents their topology as strictly hierarchical, which no longer accurately describes many systems. A unified C++ interface does not need to be bound to the limitations of a single library, and can provide a much broader representation of a system's execution resource topology.
99+
Another example of this is that while Hwloc is highly used in many domains, it now does not always accurately represent existing systems. This is because Hwloc presents their topology as strictly hierarchical, which no longer accurately describes many systems. A unified C\+\+ interface does not need to be bound to the limitations of a single library, and can provide a much broader representation of a system's execution resource topology.
100100

101101
# Proposed direction
102102

@@ -122,7 +122,7 @@ Below we outline a proposed direction:
122122
As a result of the above this paper may also:
123123

124124
* Propose a lifetime model for execution agents.
125-
* Propose some additions to the C++ machine model to facilitate describing these additional properties.
125+
* Propose some additions to the C\+\+ machine model to facilitate describing these additional properties.
126126

127127
# Suggested straw polls
128128

0 commit comments

Comments
 (0)