You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create an RDD with the number of partitions (`numSlices`) greater than the value of [spark.shuffle.sort.bypassMergeThreshold](../configuration-properties.md#spark.shuffle.sort.bypassMergeThreshold) configuration property.
val res11: Class[_ <: org.apache.spark.Aggregator[Int,Int,Int]] = class org.apache.spark.Aggregator
60
+
```
44
61
45
-
// the number of reduce partitions < spark.shuffle.sort.bypassMergeThreshold
46
-
scala> shuffleDep.partitioner.numPartitions
47
-
res4: Int = 2
62
+
Note the number of reduce partitions that is smaller than [spark.shuffle.sort.bypassMergeThreshold](../configuration-properties.md#spark.shuffle.sort.bypassMergeThreshold) configuration property.
`fetchContinuousBlocksInBatch` reads the following configuration properties to determine whether continuous shuffle block fetching could be used or not:
*[supportsRelocationOfSerializedObjects](../serializer/Serializer.md#supportsRelocationOfSerializedObjects) (of the [Serializer](../rdd/ShuffleDependency.md#serializer) of the [ShuffleDependency](BaseShuffleHandle.md#dependency) of this [BaseShuffleHandle](#handle))
48
+
49
+
`fetchContinuousBlocksInBatch` prints out the following DEBUG message when continuous shuffle block fetching is requested yet not satisfied by the configuration:
50
+
51
+
```text
52
+
The feature tag of continuous shuffle block fetching is set to true, but
53
+
we can not enable the feature because other conditions are not satisfied.
Copy file name to clipboardExpand all lines: docs/shuffle/ShuffleHandle.md
+6-1Lines changed: 6 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,8 @@
1
+
---
2
+
tags:
3
+
- DeveloperApi
4
+
---
5
+
1
6
# ShuffleHandle
2
7
3
8
`ShuffleHandle` is an abstraction of [shuffle handles](#implementations) for [ShuffleManager](ShuffleManager.md) to pass information about shuffles to tasks.
@@ -14,5 +19,5 @@
14
19
15
20
* <spanid="shuffleId"> Shuffle ID
16
21
17
-
!!! note "Abstract Class"
22
+
??? note "Abstract Class"
18
23
`ShuffleHandle` is an abstract class and cannot be created directly. It is created indirectly for the [concrete ShuffleHandles](#implementations).
Copy file name to clipboardExpand all lines: docs/shuffle/index.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Shuffle System
2
2
3
-
**Shuffle System** is a core service of Apache Spark that is responsible for shuffle blocks.
3
+
**Shuffle System** is one of the core services of Apache Spark that is responsible for shuffle blocks (of data).
4
4
5
5
The main core abstraction is [ShuffleManager](ShuffleManager.md) with [SortShuffleManager](SortShuffleManager.md) as the default and only known implementation.
`sendRequest` creates a new [BlockFetchingListener](../core/BlockFetchingListener.md) to be notified about [successes](#onBlockFetchSuccess) or [failures](#onBlockFetchFailure) of shuffle block fetch requests.
On [onBlockFetchSuccess](../core/BlockFetchingListener.md#onBlockFetchSuccess) the `BlockFetchingListener` adds a `SuccessFetchResult` to the [results](#results) registry and prints out the following DEBUG message to the logs (when not a [zombie](#isZombie)):
120
120
@@ -128,15 +128,15 @@ In the end, `onBlockFetchSuccess` prints out the following TRACE message to the
On [onBlockFetchFailure](../core/BlockFetchingListener.md#onBlockFetchFailure) the `BlockFetchingListener` adds a `FailureFetchResult` to the [results](#results) registry and prints out the following ERROR message to the logs:
134
134
135
135
```text
136
136
Failed to get block(s) from [host]:[port]
137
137
```
138
138
139
-
## <spanid="results"> FetchResults
139
+
## FetchResults { #results }
140
140
141
141
```scala
142
142
results:LinkedBlockingQueue[FetchResult]
@@ -165,7 +165,7 @@ For local blocks, `FetchResult`s are added in [fetchHostLocalBlock](#fetchHostLo
165
165
166
166
Cleaned up in [cleanup](#cleanup)
167
167
168
-
## <spanid="hasNext"> hasNext
168
+
## hasNext { #hasNext }
169
169
170
170
```scala
171
171
hasNext:Boolean
@@ -175,7 +175,7 @@ hasNext: Boolean
175
175
176
176
`hasNext` is `true` when [numBlocksProcessed](#numBlocksProcessed) is below [numBlocksToFetch](#numBlocksToFetch).
`next` is part of the `Iterator` ([Scala]({{ scala.api }}/scala/collection/Iterator.html#next():A)) abstraction (to produce the next element of this iterator).
`ShuffleBlockFetcherIterator` creates a [ShuffleFetchCompletionListener](ShuffleFetchCompletionListener.md) when [created](#creating-instance).
228
228
229
229
`ShuffleFetchCompletionListener` is used when [initialize](#initialize) and [toCompletionIterator](#toCompletionIterator).
230
230
231
-
## <spanid="cleanup"> Cleaning Up
231
+
## Cleaning Up { #cleanup }
232
232
233
233
```scala
234
234
cleanup():Unit
@@ -240,7 +240,7 @@ cleanup(): Unit
240
240
241
241
`cleanup` iterates over [results](#results) internal queue and for every `SuccessFetchResult`, increments remote bytes read and blocks fetched shuffle task metrics, and eventually releases the managed buffer.
242
242
243
-
## <spanid="bytesInFlight"> bytesInFlight
243
+
## bytesInFlight { #bytesInFlight }
244
244
245
245
The bytes of fetched remote shuffle blocks in flight
246
246
@@ -250,7 +250,7 @@ Incremented every [sendRequest](#sendRequest) and decremented every [next](#next
250
250
251
251
`ShuffleBlockFetcherIterator` makes sure that the invariant of `bytesInFlight` is below [maxBytesInFlight](#maxBytesInFlight) every [remote shuffle block fetch](#fetchUpToMaxBytes).
252
252
253
-
## <spanid="reqsInFlight"> reqsInFlight
253
+
## reqsInFlight { #reqsInFlight }
254
254
255
255
The number of remote shuffle block fetch requests in flight.
256
256
@@ -260,7 +260,7 @@ Incremented every [sendRequest](#sendRequest) and decremented every [next](#next
260
260
261
261
`ShuffleBlockFetcherIterator` makes sure that the invariant of `reqsInFlight` is below [maxReqsInFlight](#maxReqsInFlight) every [remote shuffle block fetch](#fetchUpToMaxBytes).
262
262
263
-
## <spanid="isZombie"> isZombie
263
+
## isZombie { #isZombie }
264
264
265
265
Controls whether `ShuffleBlockFetcherIterator` is still active and records `SuccessFetchResult`s on [successful shuffle block fetches](#onBlockFetchSuccess).
266
266
@@ -270,11 +270,11 @@ Enabled (`true`) in [cleanup](#cleanup).
270
270
271
271
When enabled, [registerTempFileToClean](#registerTempFileToClean) is a noop.
0 commit comments