I found another kind of concurrency problem that can easily be caught using the random delays, and it’s a very common one: It’s an atomicity violation caused when the result of one synchronized method is used as input for another synchronized method, but the whole thing isn’t part of a synchronized block. The example the DrJava developers at Rice are most familiar with is the following:
public synchronized int getLength() { ... }
public synchronized void remove(int offs, int len) { ... }
...
remove(getLength()-1, 1);
The code is supposed to remove the last character of the document. Both methods are synchronized, the two calls are not atomic, i.e. the thread may get preempted after the call to getLength
, another thread may change the document and its length, and then the original thread resumes with an incorrect length, removing either a character that’s not the last or accessing out of bounds.
If at least one method of such a two-call process is synchronized, then a delay will get inserted between the two methods, and that greatly increases the probability that preemption and corruption will occur and a unit test detects it. I’m in the process of writing a concise demonstration.
I also came up with an improvement of detecting “doomed wait
s” that I need to put in writing right now: Regardless of whether I decrement and increment the “running threads” (maybe “live threads” is a better name”) variable at monitorenter
opcodes and use that to determine whether concurrency exists, I should keep a second “running threads” variable that does get decremented before a monitorenter
and incremented after it.
This “running threads” variable is then only used to determine “doomed wait
s” and deadlocks: If a call to a wait
is made, and only one user thread is running, regardless of how many are alive, then the wait
is doomed and an exception is thrown. This situation could arise if the thread that should send the notification is involved in a deadlock.
There is a small problem, though: Unless I keep track of which locks a thread owns (strategy 3), the “running threads” variable will be decremented before a monitorenter
even if the thread already owns that lock and thus does not get blocked. If the variable gets decremented, and then a task switch occurs to a thread that calls wait
, then that wait
could erroneously be declared as “doomed”.
The solution to this problem that I came up with while napping this morning after my all-nighter is this: Don’t immediately declare a “doomed wait
“, but instead set a “progress made” flag to false
and wait a few milliseconds. If a thread passes through its monitorenter
— and therefore wasn’t blocked — it increments the “running threads” variable and sets the “progress made” flag to true
. When the original thread resumes, it checks if the “progress made” flag is still false
and only then declares the wait
as “doomed”.
I need to get to campus now so I can perhaps discuss a few ideas with Corky.