distinguish from other parameters
how many partitions will use this parameter
size of the parameter (1D vector)
start index in the origin parameter.
Retrieve gradients for the slice of the model that this node is responsible for from all the other nodes.
Retrieve gradients for the slice of the model that this node is responsible for from all the
other nodes. A new thread is created for each separate node. The gradients are then summed
and then stored in decompressed form in gradientPartition
.
average numbers.
Use a fixed thread pool to launch a thread for each partition of the weights.
Use a fixed thread pool to launch a thread for each partition of the weights. Each thread
requests a partition of the weights from the Spark block manager and copies it into
localParameter
.
The Tensor that will hold the retrieved weights.
A FutureResult which contains a Future for each thread.
Tensor to hold a slice of the global gradients.
This method should be called on each RDD partition before parameter synchronization begins.
This method should be called on each RDD partition before parameter synchronization begins.
An empty gradient tensor is placed in the block manager that can be used to store gradients.
A 1 / numPartition fraction of the parameter
tensor is copied to the block manager as a
compressed tensor.
A tensor representing the initial underlying weights of this
AllReduceParameter
start index in the origin parameter.
Slice gradients learned from this partition of data into chunks, and mark each chunk to be sent to the appropriate parameter node, and put it in the block manager.
Slice gradients learned from this partition of data into chunks, and mark each chunk to be sent to the appropriate parameter node, and put it in the block manager.
A Tensor that contains gradients computed on the entire model on a single partition of data.
Put the portion of the weights that this partition is responsible for to the block manager.
Put the portion of the weights that this partition is responsible for to the block manager. Weights are placed locally, then pulled when needed by other partitions.
size of the parameter (1D vector)
Tensor to hold a slice of the global weights.
Represent parameters stored on the block manager. In distributed optimization, we put parameters on block manager of spark. Each worker syncs parameters through the block manager. Block manager here serves as a parameter server.
A Tensor is sliced into
partitionNum
chunks and each chunk is assigned to a particular node (Spark executor). Likewise, gradients for each chunk are also assigned and stored on separate nodes. In this way, gradient aggregation and parameter updates can be performed independently for each chunk on separate nodes.Tensor element type