A Concurrent Affair

How’s the Debugging Going?

Posted on January 16, 2006 by Mathias

I don’t really read any comics. I rarely did on paper as a kid, and I don’t really do it on the web either. The one exception is Piled Higher and Deeper, the grad student comic strip. Why I read it should be clear, and if not, the current comic should make it blatantly obvious:

Why is this so appropriate, you ask? Oh, I found another bug in my algorithm.

Posted in Uncategorized | Leave a comment

Print This Post

`setOldThread` and `isOldThread`

Posted on January 15, 2006 by Mathias

The instrumentation should actually be pretty easy. I just looked at the bytecode of this class:

public class Test { public static boolean isOldThread() { return true; } public static void setOldThread() { }

public static void main(String[] args) { if (isOldThread()) { // some code } else { setOldThread(); } } }

and the main method contains these instructions:

invokestatic Test.isOldThread()Z ifeq 5 // some code goto 6 invokestatic Test.setOldThread()V return

The invokestatic Test.isOldThread()Z puts the return value on the stack. The ifeq jumps over the then block if the return value was false; the goto in line 4 jumps over the else block if it was true and the then block has already been executed.

This just needs to be turned into

invokestatic java/lang/Thread.currentThread()Ljava/lang/Thread; getfield java/lang/Thread.$$$oldThread$$$ ifeq 6 // some code goto 9 iconst_1 invokestatic java/lang/Thread.currentThread()Ljava/lang/Thread; putfield java/lang/Thread.$$$oldThread$$$ return

The old line 1 gets replaced by the new lines 1 and 2, which get the current thread, and then get its $$$oldThread$$$ field.

The old line 5 gets replaced by the new lines 6 to 8, which put true on the stack, get the current thread, and then set its $$$oldThread$$$ field.

monitorenter and monitorexit are even easier to handle.

// put object ref on stack invokestatic edu/rice/cs/cunit/SyncPointBuffer/monitorEnter

just has to be changed into

// put object ref on stack monitorenter

Exactly the same applies for monitorexit. The value on top of the stack isn’t used as first parameter anymore, now it’s directly consumed by the monitorenter or monitorexit instruction.

Posted in Concurrent Unit Testing | Leave a comment

Print This Post

At the End is at the Beginning

Posted on January 15, 2006 by Mathias

I’ve realized that the index in the schedule has to remain the same until a thread is done with a synchronization point (or rather the block following it). Only then, i. e. at the end of the block, may the index be incremented. This way, if a thread is currently executing its code and a second thread enters compactWait, it won’t procede but wait to be woken up. Theoretically, I could create a compactAdvance method that gets executed at the end of a block.

But the end of one block is the beginning of the next block, so both pieces of code can be put into compactWait. Except for the first block of a thread, though: The beginning of the first block is not the end of a previous block, so I need a conditional. The new, redesigned algorithm looks like this:

compactWait:

Repeat this… (“RETRY” loop)
1. monitorenter _waitAlgoObj (beginning of main synchronized block)
  1. If this is NOT the first sync point for this thread…
    1. Advance the indices.
  2. If the buffer has never been loaded, or if the index is at the end of the buffer…
    1. Load the buffer and reset the indices.
    2. Wake up all threads waiting for a buffer update.
  3. If the current sync point marks the end of the schedule…
    1. Disable scheduled replay.
    2. Wake up all threads waiting for a buffer update.
    3. monitorexit _waitAlgoObj
    4. Break out of the “RETRY” loop.
  4. If there’s a thread waiting for this wait array entry…
    1. Notify it
    2. Set a flag to scan ahead for the current sync point (“SCAN”).
    3. Do not advance indices.
  5. Otherwise…
    1. If the current sync point in the array is the right one…
      1. Do not advance indices.
      2. Allow the thread to exit the “RETRY” loop.
    2. Otherwise…
      1. Set a flag to scan ahead (“SCAN”).
  6. If the “SCAN” flag is set (i.e. either a thread was woken up or the current sync point did not match)…
    1. Look for the sync points in the remaining part of the array.
    2. If it could be found…
      1. Insert a new Object() into corresponding slot of the wait array.
      2. Set that Object() as “wait object” for this method.
      3. Allow the thread to exit the “RETRY” loop.
    3. Otherwise…
      1. Set the new buffer wait object as “wait object” for this method.
      2. Do not allow the thread to exit the “RETRY” loop.
    4. monitorenter waitObj (beginning of wait object synchronized block)
  7. monitorexit _waitAlgoObj (end of main synchronized block)
2. If a “wait object” has been set for this method…
  1. Call wait() on that object (make sure to continue waiting if interrupted).
  2. monitorexit waitObj (end of wait object synchronized block)
  3. Do not advance indices.
…while the thread is not allowed to exit the loop and scheduled replay is enabled (end “RETRY” loop).

compactThreadExit:

monitorenter _waitAlgoObj (beginning of main synchronized block)
1. If this is NOT the first sync point for this thread… (Note: Conditional probably not necessary)
  1. Advance the indices.
2. If the index is at the end of the buffer…
  1. Load the buffer and reset the indices.
  2. Wake up all threads waiting for a buffer update.
3. Otherwise…
  1. monitorenter _waitArray[index] (beginning of wait object synchronized block)
  2. If there’s a thread waiting for this wait array entry…
    1. Notify it.
  3. Otherwise, if the current sync point marks the end of the schedule…
    1. Disable scheduled replay.
    2. Wake up all threads waiting for a buffer update.
  4. Otherwise…
    1. Do nothing, because there’s still a thread awake that will reach this sync point.
  5. monitorexit _waitArray[index] (end of wait object synchronized block)
monitorexit _waitAlgoObj (end of main synchronized block)

The first good news is that the modified algorithm passes the first simple test without violated assertions or invalid end states:

(Spin Version 4.2.3 -- 5 February 2005)
	+ Partial Order Reduction

Full statespace search for:
	never claim         	- (not selected)
	assertion violations	+
	cycle checks       	- (disabled by -DSAFETY)
	invalid end states	+

State-vector 196 byte, depth reached 140, errors: 0
    1486 states, stored
     725 states, matched
    2211 transitions (= stored+matched)
       0 atomic steps
hash conflicts: 0 (resolved)

Stats on memory usage (in Megabytes):
0.309 	equivalent memory usage for states (stored*(State-vector + overhead))
0.490 	actual memory usage for states (unsuccessful compression: 158.38%)
	State-vector as stored = 317 byte + 12 byte overhead
2.097 	memory used for hash table (-w19)
0.320 	memory used for DFS stack (-m10000)
0.159 	other (proc and chan stacks)
0.081 	memory lost to fragmentation
2.827 	total actual memory usage

I’m actually pretty sure compactThreadExit can always advance the indices without checking. There should never be a compactThreadExit without a previous compactWait. But I’ll have to check that.

The conditional for “If this is NOT the first sync point for this thread…” means that I have to add another variable to the java.lang.Thread class, and I’ll also have to get and set this variable in compactWait. That’s a little bit difficult since the variable doesn’t exist in the normal uninstrumented class.

So I think what I’ll do is add two empty dummy methods, public static void setOldThread() { } and public static boolean isOldThread() { }, to SyncPointBuffer and then call them from within compactWait. The calls, however, only serve as a markers for an instrumentation strategy which inserts the code to set the flag to true or get its value instead of the call.

I plan to use the same technique for embedding monitorenter and monitorexit instructions that don’t correspond to synchronized blocks.

Now I have to do laundry, though. I wish I could do that and coding concurrently…

Posted in Concurrent Unit Testing | Leave a comment

Print This Post

Assertions Violated

Posted on January 15, 2006 by Mathias

Indeed, there exist schedules that let the synchronization points occur in the incorrect order. I added two assertions and a counter to the framework that make sure that the program is actually following the schedule. Note that these assertions are outside any kind of lock, of course:

... printf("[%d] replayWait ends ------------------------\n", tid);

/*------------------------------------------------------------- * Assert that the events are happening in the correct order */ atomic{ printf("@@@@@ Asserting actual [%d %d] == expected [%d %d] @@@@@\n", code, tid, schedule[outScheduleIndex], schedule[outScheduleIndex+1]); assert(code == schedule[outScheduleIndex]); assert(tid == schedule[outScheduleIndex+1]); outScheduleIndex = outScheduleIndex + 2; }

Now I let SPIN simulate all possible interleavings while checking for violated assertions, and at least one violating schedule was found:

pan: assertion violated (tid==schedule[(outScheduleIndex+1)]) (at depth 76)
pan: wrote pan_in.trail
pan: reducing search depth to 75
pan: wrote pan_in.trail
pan: reducing search depth to 73
pan: wrote pan_in.trail
pan: reducing search depth to 71
pan: wrote pan_in.trail
pan: reducing search depth to 69
pan: wrote pan_in.trail
pan: reducing search depth to 69
pan: wrote pan_in.trail
pan: reducing search depth to 65
pan: wrote pan_in.trail
pan: reducing search depth to 61
pan: wrote pan_in.trail
pan: reducing search depth to 60
pan: wrote pan_in.trail
pan: reducing search depth to 59
pan: wrote pan_in.trail
pan: reducing search depth to 58
pan: wrote pan_in.trail
pan: reducing search depth to 56
pan: wrote pan_in.trail
pan: reducing search depth to 54
pan: wrote pan_in.trail
pan: reducing search depth to 53
pan: wrote pan_in.trail
pan: reducing search depth to 52
(Spin Version 4.2.3 -- 5 February 2005)
	+ Partial Order Reduction

Full statespace search for:
	never claim         	- (not selected)
	assertion violations	+
	cycle checks       	- (disabled by -DSAFETY)
	invalid end states	+

State-vector 192 byte, depth reached 139, errors: 14
     953 states, stored
     615 states, matched
    1568 transitions (= stored+matched)
       0 atomic steps
hash conflicts: 0 (resolved)

Stats on memory usage (in Megabytes):
0.194 	equivalent memory usage for states (stored*(State-vector + overhead))
0.706 	actual memory usage for states (unsuccessful compression: 363.30%)
	State-vector as stored = 729 byte + 12 byte overhead
2.097 	memory used for hash table (-w19)
0.002 	memory used for DFS stack (-m52)
0.082 	memory lost to fragmentation
2.724 	total actual memory usage

Here’s a summary of the shortest trace that violates the assertions. It looks exactly as I suspected last night:

preparing trail, please wait...done
Starting :init: with pid 0
  1:	proc  0 (:init:) line  22 "pan_in" (state 1)
	[schedule[0] = 1]
...
 10:	proc  1 (thread) line  12 "waitalgo.pml" (state 1)
	[printf('[%d] replayWait(%d, %d)\\n',tid,code,tid)]
...
 28:	proc  1 (thread) line  95 "waitalgo.pml" (state 63)
	[printf('[%d] sync point match at index = %d\\n',tid,replayIndex)]
...
[1] replayWait ends ------------------------
 36:	proc  1 (thread) line 236 "waitalgo.pml" (state 159)
	[printf('[%d] replayWait ends ------------------------\\n',tid)]
	
 37:	proc  2 (thread) line  22 "waitalgo.pml" (state 2)
	[((algo_lock==0))]	
...
 45:	proc  2 (thread) line  95 "waitalgo.pml" (state 63)
	[printf('[%d] sync point match at index = %d\\n',tid,replayIndex)]
...
[2] replayWait ends ------------------------
 53:	proc  2 (thread) line 236 "waitalgo.pml" (state 159)
	[printf('[%d] replayWait ends ------------------------\\n',tid)]
	
...
 54:	proc  2 (thread) line 243 "waitalgo.pml" (state 161)
	[assert((code==schedule[outScheduleIndex]))]	
spin: line 244 "waitalgo.pml", Error: assertion violated
spin: text of failed assertion: assert((tid==schedule[(outScheduleIndex+1)]))
#processes: 4
 54:	proc  3 (thread) line  17 "waitalgo.pml" (state 156)
 54:	proc  2 (thread) line 244 "waitalgo.pml" (state 162)
 54:	proc  1 (thread) line 240 "waitalgo.pml" (state 164)
 54:	proc  0 (:init:) line  30 "pan_in" (state 12)
4 processes created

Assertions are a good thing, but it’s always sad when they are violated, along with your hopes.

Posted in Concurrent Unit Testing | Leave a comment

Print This Post

Perforce Triggers and Windows

Posted on January 15, 2006 by Mathias

I’ve spent a few hours trying to learn about Perforce triggers. These triggers allow the server to automatically run scripts when users perform certain actions, like submitting files. What I wanted to do is automatically upload the RiceMBS to its website and have the javadocs be rebuilt. This is exactly what these triggers are made for. I have the bash script written that deletes the old files, synchronizes with Perforce, and then generates new javadocs. However…

I just can’t get damn Windows to automatically log into my SSH account.

Really. Damn Windows. When I run the commands by hand from the command line, everything works beautifully, but when I let the service run it in the background, I don’t even get an error message.

I need to get away from Windows… Then again, I’m only using Perforce as a one-user repository so my data is backed up and can be rolled back.

Posted in Uncategorized | Leave a comment

Print This Post

Another Mistake?

Posted on January 15, 2006 by Mathias

I haven’t actually verified it, but I have a gut feeling there’s another mistake in my algorithm. Let’s look at the part when a sync point actually matches right away. I’ve excluded all the parts that don’t get executed and shaded the conditionals that don’t get entered gray:

Repeat this… (“RETRY” loop)
1. monitorenter _waitAlgoObj (beginning of main synchronized block)
  1. If the buffer has never been loaded, or if the index is at the end of the buffer…
    1. …
  2. If the current sync point marks the end of the schedule…
    1. …
  3. If there’s a thread waiting for this wait array entry…
    1. …
  4. Otherwise…
    1. If the current sync point in the array is the right one…
      1. Advance indices.
      2. Allow the thread to exit the “RETRY” loop.
    2. Otherwise…
      1. …
  5. If the “SCAN” flag is set (i.e. either a thread was woken up or the current sync point did not match)…
    1. …
  6. monitorexit _waitAlgoObj (end of main synchronized block)
2. If a “wait object” has been set for this method…
  1. …
…while the thread is not allowed to exit the loop and scheduled replay is enabled (end “RETRY” loop).

Right after the match is found, the indices are advanced, the algorithm lock is released, and the method is exited. Right at this point, another thread could come along, find another match, race past the first thread and mess up the schedule.

How do I prevent this? Maybe advance the indices at the next wait point. How exactly? I don’t know yet.

Posted in Concurrent Unit Testing | Leave a comment

Print This Post

Giving It a Good SPIN

Posted on January 14, 2006 by Mathias

I discussed earlier how synchronized blocks are not going to do it anymore, because I have to interleave the regions in which the locks are held. That makes the code significantly more complex. Therefore, I decided to dig out SPIN and Promela again. Fortunately, I had taken a course on it, COMP 607: Automated Program Verification.

This project is a lot more complex than the toy examples I had to work with before. One thing that I found particularly annoying is that Promela doesn’t have a notion of procedures. The code in replayWait and replayThreadExit is going to be executed in many places, and to get it correct, I really want to avoid code duplication. And that’s what procedures should be there for… In Promela, I could have spawned a new process and waited for its completion, but I wanted to have only one Promela process per simulated thread.

Fortunately, SPIN runs the C preprocessor over the Promela source code, which is mainly used for #define directives; it also provides me with the #include directive, though. So I put the different functions into different include files and assume that parameters are passed in a number of local variables, namely tid and possibly code. That’s rather ugly (especially if you stick to the C mantra never to actually include code), but it does work.

Now I have 384 lines of Promela code that model the algorithm described earlier pretty much down to the “T”. I’ve written one test file so far, it just has three threads with one sync point each:

/* Concutest - Concurrent Unit Testing * * Test 1 * * Written by Mathias Ricken, Rice University. */


#define SCHEDULE\_SIZE 8

#define REPLAY\_SIZE 4

#define REPLAY\_SIZE\_X2 8
#include "vars.pml"
proctype thread(int tid) {

   local int code = 1;

   #include "waitalgo.pml"
   #include "exitalgo.pml"

}
init {

   schedule[0] = 1; schedule[1] = 1;

   schedule[2] = 1; schedule[3] = 2;

   schedule[4] = 1; schedule[5] = 3;

   schedule[6] = 3; schedule[7] = 0;

run thread(1); run thread(2); run thread(3); }

Ugly, huh? As you can see, I use #include directives to include common global variables (vars.pml) and make “calls” to waitalgo.pml (compactWait) and exitalgo.pml (compactThreadExit). There’s another file, loadbuffer.pml, that factors out loading the next sync points out of the provided schedule, resetting the indices, and notifying threads waiting for a buffer update.

So far, it passes the SPIN “invalid end states” verification:

(Spin Version 4.2.3 -- 5 February 2005)

Full statespace search for:
	never claim         	- (not selected)
	assertion violations	- (disabled by -A flag)
	cycle checks       	- (disabled by -DSAFETY)
	invalid end states	+

State-vector 188 byte, depth reached 135, errors: 0
   11377 states, stored
   15019 states, matched
   26396 transitions (= stored+matched)
       0 atomic steps
hash conflicts: 90 (resolved)

Stats on memory usage (in Megabytes):
2.184 	equivalent memory usage for states (stored*(State-vector + overhead))
2.336 	actual memory usage for states (unsuccessful compression: 106.92%)
	State-vector as stored = 201 byte + 4 byte overhead
2.097 	memory used for hash table (-w19)
0.280 	memory used for DFS stack (-m10000)
0.113 	other (proc and chan stacks)
0.084 	memory lost to fragmentation
4.630 	total actual memory usage

This is only a very basic test case, but it passing SPIN’s verification means that at least with this schedule there is no interleaving of threads that can lead to an error. Now I’ll have to create more complicated schedules and threads and simulate them. Once I’ve verified that the algorithm as implemented in Promela is correct, I have to make a careful translation to Java.

Posted in Concurrent Unit Testing | Leave a comment

Print This Post

Kooprey

Posted on January 9, 2006 by Mathias

We got a request for Kooprey from a colleague today. Kooprey is our automatic generator for predictive recursive descent parsers formulated in our object-oriented way.

I have promised to take another look at it, maybe polish the code a little, and then make it available.

Posted in Research | Leave a comment

Print This Post

Mock-Up

Posted on January 9, 2006 by Mathias

I’ve written a mock-up test of the algorithm with the interleaved synchronized blocks, using a short, inefficient Monitor class that immitates the behavior of Java monitors:

class Monitor { private AtomicBoolean _b = new AtomicBoolean(false); private Queue

How’s the Debugging Going?

`setOldThread` and `isOldThread`

At the End is at the Beginning

Assertions Violated

Perforce Triggers and Windows

Another Mistake?

Giving It a Good SPIN

Kooprey

Mock-Up

In Java Bytecode…

Right Direction, Not Right

`compactWait` and `compactThreadExit`

When a Thread Terminates

Downgrade

The End is Important

Even Trickier

Definitely Tricky

Tricky Bit of Code

Happy New Year

Don’t Know, But Managed Anyway

Categories

Search

Archive

Meta