Apache SINGA
A distributed deep learning platform .
|
Protected Member Functions | |
virtual Msg * | HandleGet (Msg **msg) |
Process GET request. More... | |
const std::vector< Msg * > | HandleUpdate (Msg **msg) |
Process Update request. More... | |
virtual Msg * | HandlePut (Msg **msg) |
Process PUT request. More... | |
virtual Msg * | HandleSyncRequest (Msg **msg) |
Handle sync request from other server groups. More... | |
void | HandleSyncResponse (Msg **msg) |
Handle sync response. More... | |
Protected Attributes | |
int | thread_id_ |
int | grp_id_ |
int | id_ |
Updater * | updater_ |
map from slice ID to slice and deleted in the destructor | |
std::unordered_map< int, ParamEntry * > | shard_ |
std::vector< int > | slice2group_ |
std::vector< int > | slice2server_ |
num of updates from last sync with master server group for a param/slice | |
std::vector< int > | nUpdates_ |
num of sync requests that have not been responded | |
std::vector< int > | nPendingSync_ |
std::vector< Blob< float > > | last_sync_ |
std::unordered_map< int, std::vector< Msg * > > | buffer_requests_ |
Process GET request.
Process PUT request.
Handle sync request from other server groups.
It adds updates of Param (slice) from other server groups directly to local Param (slice). Currently, each Param (slice) has a master group, i.e., slice2group_[sliceid], which would receive such requests from all other server groups for the Param object.
msg | request msg containing the parameter updates |
|
protected |
Handle sync response.
The response msg includes the latest values of a Param object, for which this server sent the sync request to the master/maintainer group. The local Param values are replaced with the addition result of local udpates since the sync request was sent and the received Param values.
response | message |
Process Update request.
It waits until received the gradients from all workers from the same worker group. After updating, it responses to each sender with the new Param values. It may generate a sync message to the server group that maintains the global version of the updated Param (slice).
Note: there is no counter for each worker group on the number of received update requests. Hence it is possible that the server would conduct the update when it receives x requests from group a and y requests from group b where x + y = group size. To avoid this problem, we can