Another Concurrency Bug I Might Detect

The first problem that execution with random delays can catch is a temporal dependency: Thread 1 expects that thread 2 has finished performing a task, but this is not enforced through synchronization.

My girlfriend asked if I could detect the exact opposite: What if thread 1 expects that thread 2 has not finished performing some task, but it actually has? This is probably another common problem, especially with novices. It gets particularly bad if it involves wait and notify, and one thread is waiting for another thread’s notification, but that notification was done before the first thread was ready for it. Here’s an example:

import junit.framework.TestCase; /** * A multithreaded test case exhibiting a problem with synchronization: * One thread is waiting for another thread's notify, but that notify * has already occurred. * @author Mathias Ricken */ public class SyncProblem3 extends TestCase { public void testNotifyTooEarly() { final Character signal = new Character('x'); Thread worker = new Thread(new Runnable() { public void run() { System.out.println("Worker thread running"); try { Thread.sleep(2000); } catch(InterruptedException e) { /* ignore */ }


                synchronized(signal) {

                    System.out.println("Worker thread calling notify");

                    signal.notify();

                }

try { Thread.sleep(3000); } catch(InterruptedException e) { /* ignore */ } System.out.println("Worker thread done"); } }); System.out.println("Main thread starting worker thread..."); worker.start(); System.out.println("Main thread started worker thread..."); try { synchronized(signal) { System.out.println("Main thread waits..."); signal.wait(); System.out.println("Main thread woken up"); } } catch(InterruptedException e) { /* ignore */ } System.out.println("Main thread done"); } }

Here, the calls to Thread.sleep simulate performing some computation, of course. Under normal circumstances, the worker thread will call signal.notify() long after the main thread has reached signal.wait(), so the main thread gets woken up.

If, however, the main thread for some reason takes longer than usual to get to signal.wait(), then the notification may be lost. The correct way of doing this, of course, is to include a flag that’s protected by the lock of the same object that is being used for signaling: The flag is initially false, gets set to true before the call to notify, and wait is only called if the flag is still false.

... public class SyncProblem3 extends TestCase { boolean flag = false; public void testNotifyTooEarly() { ... Thread worker = new Thread(new Runnable() { public void run() { ... synchronized(signal) { flag = true; signal.notify(); } ... } }); ... try { synchronized(signal) { if (!flag) { signal.wait(); } } } ... } }

Note that if the notify is reached first and the notification is lost, then this unit test does not fail but hangs. For this reason, any unit test should have a timeout set, and if the test has not finished executing after the specified time has run out, the test should be considered a failure. An examination of the thread stacks would then show that one of the threads had made a call to Object.wait.

There are probably more elegant ways of doing this. Two things come to my mind right now:

The test could be run with a different version of Object.wait that includes a timeout. If the time is exceeded, it throws an exception, forcing the test to fail.

The problem again is the choice of the timeout length. Some tests could potentially run for a very long time and then succeed, and that is the expected behavior.

The most flexible thing to do is to set the timeout length in an annotation, the way JUnit 4.0 already does it:

@Test(timeout = 60L) void testNotifyTooEarly() ...
In some cases, it may be possible to detect that a thread cannot be woken up again, either because there are no other user threads left alive, or because there is a deadlock. The first case is easy to check just before the call to Object.notify (Update: My tests indicate that a wait without timeout should be broken down into a series of waits with timeouts and interspersed checks of the number of living threads); the second case would require the deadlock detector.

I’m confident I can provide a system that’s useful here, too. And I’m glad I have a smart girlfriend.

Another Concurrency Bug I Might Detect

About Mathias

Leave a Reply

Categories

Search

Archive

Meta