Ideas and Another Bug to Catch

I found another kind of concurrency problem that can easily be caught using the random delays, and it’s a very common one: It’s an atomicity violation caused when the result of one synchronized method is used as input for another synchronized method, but the whole thing isn’t part of a synchronized block. The example the DrJava developers at Rice are most familiar with is the following:

public synchronized int getLength() { ... }
public synchronized void remove(int offs, int len) { ... }
...
remove(getLength()-1, 1);

The code is supposed to remove the last character of the document. Both methods are synchronized, the two calls are not atomic, i.e. the thread may get preempted after the call to getLength, another thread may change the document and its length, and then the original thread resumes with an incorrect length, removing either a character that’s not the last or accessing out of bounds.

If at least one method of such a two-call process is synchronized, then a delay will get inserted between the two methods, and that greatly increases the probability that preemption and corruption will occur and a unit test detects it. I’m in the process of writing a concise demonstration.

I also came up with an improvement of detecting “doomed waits” that I need to put in writing right now: Regardless of whether I decrement and increment the “running threads” (maybe “live threads” is a better name”) variable at monitorenter opcodes and use that to determine whether concurrency exists, I should keep a second “running threads” variable that does get decremented before a monitorenter and incremented after it.

This “running threads” variable is then only used to determine “doomed waits” and deadlocks: If a call to a wait is made, and only one user thread is running, regardless of how many are alive, then the wait is doomed and an exception is thrown. This situation could arise if the thread that should send the notification is involved in a deadlock.

There is a small problem, though: Unless I keep track of which locks a thread owns (strategy 3), the “running threads” variable will be decremented before a monitorenter even if the thread already owns that lock and thus does not get blocked. If the variable gets decremented, and then a task switch occurs to a thread that calls wait, then that wait could erroneously be declared as “doomed”.

The solution to this problem that I came up with while napping this morning after my all-nighter is this: Don’t immediately declare a “doomed wait“, but instead set a “progress made” flag to false and wait a few milliseconds. If a thread passes through its monitorenter — and therefore wasn’t blocked — it increments the “running threads” variable and sets the “progress made” flag to true. When the original thread resumes, it checks if the “progress made” flag is still false and only then declares the wait as “doomed”.

I need to get to campus now so I can perhaps discuss a few ideas with Corky.

Share

About Mathias

Software development engineer. Principal developer of DrJava. Recent Ph.D. graduate from the Department of Computer Science at Rice University.
This entry was posted in Concurrent Unit Testing. Bookmark the permalink.

Leave a Reply