[threads] Fix race between async suspend and compensation
authorLudovic Henry <ludovic@xamarin.com>
Thu, 22 Oct 2015 20:01:10 +0000 (21:01 +0100)
committerLudovic Henry <ludovic@xamarin.com>
Wed, 28 Oct 2015 14:34:56 +0000 (14:34 +0000)
The race only happens on Posix with unified suspend.

The chain of event is the following:
- T1 suspend the world (calls `sgen_unified_suspend_stop_world`)
- T2 is async suspended
- T2 execute suspend_signal_handler:
 - T2 finish async suspend
 - T2->suspend_can_continue is set to FALSE as the thread is detaching
 - T2 notify T1 via `mono_threads_notify_initiator_of_suspend`
 - T2 is yielded (via the OS scheduler, or forced via a `sleep`)
- T1 returns from `mono_threads_wait_pending_operations`
- T1 does its stuff
- T1 restart the world
- T1 (or any another thread, except for T2) suspend the world (it does not
  necessarily uses `sgen_unified_suspend_stop_world`)
- T2 is async suspended: incrementation of T2->suspend_count to 2, but it is
    not signalled because it is already in ASYNC_SUSPENDED state
- T2 wakes up (for any reason) and execute compensation <- ERROR: suspend_count != 1

This race can happen because when async suspending, we notify the
suspendor before executing the compensation, in this case for a thread
which is detaching. By simply post-ponning the notification, just after
the compensation, we eliminate this race.

mono/utils/mono-threads-posix-signals.c

index 1dfdd38b571102da804e09351499527d0c5c910b..d6b638829288f996cf07e65b70dfa621133b97ec 100644 (file)
@@ -178,7 +178,6 @@ suspend_signal_handler (int _dummy, siginfo_t *info, void *context)
        /* thread_state_init_from_sigctx return FALSE if the current thread is detaching and suspend can't continue. */
        current->suspend_can_continue = ret;
 
-
        /*
        Block the restart signal.
        We need to block the restart signal while posting to the suspend_ack semaphore or we race to sigsuspend,
@@ -186,19 +185,23 @@ suspend_signal_handler (int _dummy, siginfo_t *info, void *context)
        */
        pthread_sigmask (SIG_BLOCK, &suspend_ack_signal_mask, NULL);
 
-       /* We're done suspending */
-       mono_threads_notify_initiator_of_suspend (current);
-
        /* This thread is doomed, all we can do is give up and let the suspender recover. */
        if (!ret) {
                THREADS_SUSPEND_DEBUG ("\tThread is dying, failed to capture state %p\n", current);
                mono_threads_transition_async_suspend_compensation (current);
+
+               /* We're done suspending */
+               mono_threads_notify_initiator_of_suspend (current);
+
                /* Unblock the restart signal. */
                pthread_sigmask (SIG_UNBLOCK, &suspend_ack_signal_mask, NULL);
 
                goto done;
        }
 
+       /* We're done suspending */
+       mono_threads_notify_initiator_of_suspend (current);
+
        do {
                current->signal = 0;
                sigsuspend (&suspend_signal_mask);