Disconnect instantly while reconnecting #197
                
     Open
            
            
          
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
We noticed that trying to shut down clients in reconnecting-state may take several minutes before the
on_disconnectcallback gets called. The reason appears to be that when theconnectionhas nochannelwhen it has no network, the disconnection request will not be handled until the next scheduled reconnect-task is run. This may take up to 128 seconds, which is a bit long for e.g. demanding people to wait until they can reboot their computer. In addition to this, if the client is running as a Windows service and the computer is in S0 low-power idle state, the service process gets only a fraction of normal CPU time, and it may take over 30 minutes before the reconnect task gets run.It looks like the disconnection task is heavily tied with a
channelof theconnection, which is not available when there's no network, so currently we wait until a reconnect-task tries to create a newchannel. However once the disconnect-request has been made, the newchannelwill be only used for callings_mqtt_client_shutdown, which is to be called after thechannelhas completed its shutdown, so having achannelshould not be needed?This patch skips the wait for the reconnect-task, and instead lets the disconnect-task to call
s_mqtt_client_shutdowndirectly in case there's no channel. This shutdown function was moved from client_channel_handler.c to client.c to get access tostatic void s_mqtt_client_shutdown()defined within client.c. It also feels like it fits there almost better than the channel-file, as the only link to channels appears to be thats_mqtt_disconnect_taskshould ensure that the channel has been shut down befores_mqtt_client_shutdowngets called.Testing this did not show any obviuous side-effects even for the case where reconnect task gets run after disconnect, as the call
aws_atomic_store_ptr(&connection->reconnect_task->connection_ptr, NULL)froms_mqtt_disconnect_taskappears to cause the reconnect tasks_attempt_reconnectto bail out due to connection pointer being NULL.One worry pointed out by @kankri about this patch is the small but possible chance that the
s_mqtt_disconnect_taskands_attempt_reconnecttasks would get run simultaneously by two separate aws event loop threads, and thens_attempt_reconnectacquires a pointer toconnectionwhose content might get freed and/or invalidated bys_mqtt_disconnect_taskbefores_attempt_reconnectis done. But I guess this is already a potential problem, because the comment "If there is an outstanding reconnect task, cancel it" insides_mqtt_disconnect_taskalready assumes that there might be a reconnect task scheduled for the short term future? We could make a separate issue out of that.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.