No Static Random in SyncPointBuffer

Last night I was doing some refactoring to change the AInsertBeforeOpcodeStrategy into a more general strategy that can insert both before and after an opcode, based on four predicates that determine whether the class should be instrumented (1), whether the method in the class should be instrumented (2), and finally whether code should be inserted before (3) and after (4) a particular opcode in the method.

When I ran my tests, I noticed that recording wasn’t working anymore. I tracked it down to my use of the java.util.Random class in SyncPointBuffer. Somehow I’m not allowed to reference it there, or that early in the VM initialization. Using Math.random(), however, is ok. I find that weird: I just don’t really see the difference. The java.lang.Math class has a static Random field, just like my class did.

This is going to make it harder to display the seed and set the same seed again for future runs. Maybe I can somehow put it in a different class. Strictly speaking, the delayed execution stuff doesn’t have much to do with a “sync point buffer” anyway, and I only put the methods in the SyncPointBuffer class for convenience.

Share
Posted in Concurrent Unit Testing | Leave a comment

Print This Post Print This Post  

Simple Test for Delay Testing

Even though I haven’t implemented inserting the delays at most of the sync points, i.e. not at monitorenter, monitorexit, Object.wait, Object.notify, Object.notifyAll, and Thread.join, I have already written my first test. I only have delays at Thread.start, Thread.run, Thread.exit, but I wanted some buggy program to test the idea already.

It was actually pretty hard to come up with a simple program that was incorrectly synchronized and whose flaw would be detected by execution with random delays. Most flaws that came to my mind were race conditions without synchronization. Here, however, we assume that the program is race-free.

The program that I came up with now uses two threads, and one thread assumes that the other thread has finished its work, but this dependency is not expressed in code (using a Thread.join).

import junit.framework.TestCase;
/**
* A multithreaded test case exhibiting a problem with synchronization:
* One thread is dependent on the completion of another thread's work,
* but this is not expressed in code.
*/
public class SyncProblem2 extends TestCase {
private Character lock = new Character('1');
private volatile int sharedInt = 0;
public void testUnexpressedDependency() {
Thread worker = new Thread(new Runnable() {
public void run() {
System.out.println("Worker thread is running");
lengthyProcessing("Worker thread", 50, '+');
synchronized(lock) {
sharedInt = sharedInt + 1;
System.out.println("sharedInt == 1 == "+sharedInt);
}
}
});
worker.start();
lengthyProcessing("Main thread", 10000, '.');
synchronized(lock) {
sharedInt = sharedInt - 1;
System.out.println("sharedInt == 0 == "+sharedInt);
assertEquals("The shared integer should be back to 0",
0, sharedInt);
}
}
private void lengthyProcessing(String threadName, int iterations,
char progressChar) {
System.out.print(threadName+" is doing work");
for(int i=0; i

The two threads share a variable int sharedInt that starts out at 0. Access to it is protected by the field lock, so there aren't any race conditions (not that they'd matter here anyway).

The worker thread spends some time doing something (simulated by the lengthyProcessing method), then increments sharedInt, and finishes.

The main thread starts the worker thread, also spends some time doing something, and then decrements sharedInt. The amount of work is set up so that under normal conditions, the worker thread will have finished and the value of sharedInt will be 1. The main thread asserts that after decrementing sharedInt, its value is back at 0.

I have added two targets, run-normal and run-delay to my Ant script to run programs and test cases more easily. If I'm using the former target, the program executes as described:

mgricken@manifold ~/research/Concutest/ClassLoader
$ ant run-normal -Dtest-class-name=SyncProblem2
...
BUILD SUCCESSFUL
Total time: 1 second

With run-delay, however, delays get inserted, in particular at the beginning of the worker thread's run method. This delays incrementing sharedInt long enough so that the main thread can decrement it first, resulting in a value of -1 and a failed assertion:

mgricken@manifold ~/research/Concutest/ClassLoader
$ ant run-delay -Dtest-class-name=SyncProblem2
...
    [junit] Testcase: testUnexpressedDependency(SyncProblem2):  FAILED
    [junit] The shared integer should be back to 0 expected:<0> but was:<-1>
    [junit] junit.framework.AssertionFailedError: The shared integer should be back to 0
            expected:<0> but was:<-1>
    [junit]     at SyncProblem2.testUnexpressedDependency(SyncProblem2.java:33)
BUILD FAILED
/Users/mgricken/Documents/Research/Concutest/ClassLoader/build.xml:708: Test SyncProblem2 failed

This is just a really simple example, though. I need more examples with more complex flaws, especially when I add delays to more places. I also need to figure out how long these delays should be on average. Unfortunately, I don't think there's a simple answer. I also need to think about which situations count as "concurrency" some more:

  1. several threads actually running at the same time
  2. several threads being spawned in the same test
  3. any reference to the Thread class in the code

The first one would be the most restrictive notion and accept many tests as non-concurrent, just because by chance threads didn't overlap; the last one would be very conservative and treat threads as potentially concurrent just because they use threading code. I hope I'll get a chance to discuss this with Corky soon.

Share
Posted in Concurrent Unit Testing | Leave a comment

Print This Post Print This Post  

Constants Not Configurable

I’ve tried to implement what I described below: I had an interface with a bunch of public static final fields with the constants. Then I referenced these fields from elsewhere in the code and planned to just append a different version of the same interface to the beginning of the classpath.

The problem is that javac performs constant propagation. It recognizes that these constants are final and therefore inserts the literal value instead of using a reference to the constant field. When I remove the final modifier and turn the interface into a class, then the VM initialization craps out.

Looks like I have to come up with a different idea. Unfortunately these constants are used in many places, so patching them directly isn’t really an option.

Share
Posted in Concurrent Unit Testing | Leave a comment

Print This Post Print This Post  

Making Constants Configurable

There are several constants that a user is likely going to want to change, like the size of the buffers for recording, or the minimum and maximum length of the delays during execution with random delays.

These constants are typically found in classes like SyncPointBuffer and other classes that first get put in the cunitrt.jar file. This file contains all the classes that make up the least dependency fixpoint of the classes required for the Concutest runtime. After it has been created, it gets merged with the rt.jar file during instrumentation and then put on the boot classpath so it’s available right from the beginning during VM initialization.

The problem is that instrumentation is a rather lengthy process (it takes a few minutes), so I want to avoid running a full instrumentation just because a number, like the delay in milliseconds, has changed. That means I have to store these constants externally somehow, outside the rt.jar file (or update the rt.jar file).

The normal way to do this would probably to use a file, maybe a text file. My XML configuration library could easily deal with this. However, since this is happening very often and — more importantly — very early during VM initialization, I want the process to be especially light-weight, especially easy for the VM executing the program.

What’s the easiest way for the VM to get a value? The easiest way is to simply read a value out of a public static final field, or more technically, to load a value out of the constant pool. That’s exactly what’s going on right now, when all these values are hard-coded.

In fact, it cannot be done more easily. Any retrieval from a file, for example, would require the VM to load the name of the file out of the constant pool, and that is already the work necessary to retrieve a constant, so reading the value from a file has to be more work.

But how do I make these constants tweakable? I have decided to create a separate class (an interface, to be precise), edu.rice.cs.cunit.ConcutestConstants, that contains all tweakable constants. It is also included in the cunitrt.jar file and therefore in the rt.jar file during instrumentation. Nothing has really changed right now.

What I can do now, though, is create a copy of that class that lives outside of the rt.jar file and that has different values than the one inside the rt.jar file. If I put this class at the beginning of the boot classpath, then the class outside with the different values will be found first. Creating that modified class file is a little bit more work for the launcher program than writing to a text file or using Java attributes and System.getProperty, but this approach is as light-weight as it can be for the executing program.

The battery on my MacBook (now with 2 GB RAM, thank you, Corky!) is about to run out, so I’ll get back to implementing this on my workstation at home.

Share
Posted in Concurrent Unit Testing | Leave a comment

Print This Post Print This Post  

Writing about the Summer School

Here are a few links to other people’s blogs about the summer school:

I wish I had something more coherent to say, but for me, the summer school, my work, and DrJava are completely intertwined.

Share
Posted in Research | Leave a comment

Print This Post Print This Post  

Thread.run, not Thread.start

I am reusing code written for the other instrumentors (and in my previous posts was definitely influenced by them), and one of the instrumentors that I copied and modified was CompactThreadStartStrategy. It now inserts a call to SyncPointBuffer.delayThreadStart towards the end of Thread.start. That sort of works.

However, I just realized that Thread.start is executed by the parent thread that is starting the other thread, not by the thread actually started. I think what I really want is a delay at the beginning of the Thread.run method. It may make sense to delay thread start as well, though, so I will keep DelayThreadStartStrategy and add DelayThreadRunStrategy.

Share
Posted in Concurrent Unit Testing | Leave a comment

Print This Post Print This Post  

Only Change Counter at Thread Start and Exit?

Right now, I’m actually not so sure that I really want to decrement the number of running threads before a monitorenter, Object.wait or Thread.join. A thread could reach a monitorenter, check the counter and think there is only one active thread, but right after the check was made and before the monitorenter is actually reached another thread continues to run, leaves its monitorenter and increments the count. In this case, no delay would be executed, even though I think there should be one because of potential subtle interactions between the two threads. The same situation could happen twice, and then both threads are unaware that there actually is concurrency in the program. I think I am just going to increment and decrement in Thread.start and Thread.exit.

Another suggestion Corky made that I just remembered was that when no threads are spawned and there was no concurrency, then the test should indicate that somehow to reduce the number of repetitions of that test, perhaps even not repeating the test at all. That’s not really conservative, though. but yet again, it might be sufficient.

I also need to come up with a few good examples of synchronization bugs that this technique can catch.

Share
Posted in Concurrent Unit Testing | Leave a comment

Print This Post Print This Post  

Random Delays at Sync Points

At our last meeting, Corky suggested another approach for the first cut of our test framework. Instead of going for the full record-and-replay framework that works with fixed schedules that allow bugs to be reproduced, I should work on a version that inserts random delays at synchronization points. This would sort of be a derivative the heuristic version he initially proposed and that should be used when there are too many schedules to run them all. It has drawbacks, namely that it does not try all schedules and that a schedule which lead to an error cannot be reproduced, but it does not require the replay mechanism that I’ve just had so many difficulties with.

After making some more small improvements on the deadlock detector yesterday and earlier today, I’ve started to take a look at what would be necessary to make this new approach work. The deadlock detector isn’t finished either. I still have to make it work on bigger programs.

Which Synchronization Points to Watch?

I need to consider the following synchronization points (synchronized methods get converted to unsynchronized methods with a synchronized block again):

  • monitorenter
  • monitorexit
  • Object.wait
  • Object.notify
  • Object.notifyAll
  • Thread.join
  • Thread.start
  • Thread.run (updated)
  • Thread.exit

Where to Insert the Delays?

Where should the delays be inserted? For monitorenter, there should definitely be a delay before the block is entered: At that time, any thread can still acquire the lock and dramatically change program execution. Conversely, for monitorexit, there should probably be a delay after the block is exited: At that time, the thread has relinquished control of the lock and another thread might grab it. I’m not really sure this delay is necessary, though, since anything important should have a monitorenter before it anyway, so if the same thread that just released a lock were to acquire another one, there would be another delay anyway. monitorexit nonetheless is a place where the synchronization sequence can easily change, so I’m going to put a delay there too. It might also be prudent to insert a delay after the monitorenter to expose situations in which a lock is used but it is ineffective.

The same that applied to monitorenter also applies to Object.wait, which can be thought of as a generalized version in which any thread — not just the JVM — can wake up any other thread. I will also handle Thread.join that way: It’s sort of a “wake-up by thread death”.

For Object.notify and Object.notifyAll, I am going to insert a delay before the call to decouple the notification order from the order in which the threads doing the notifications execute. Right now I don’t think a delay after the call is necessary.

Finally, Thread.start and Thread.exit will have delays after the thread start and before the thread death, respectively. Here’s a summary:

Delays before:

  • monitorenter
  • Object.wait
  • Object.notify
  • Object.notifyAll
  • Thread.join
  • Thread.exit

Delays after:

  • monitorenter
  • monitorexit
  • Object.wait
  • Thread.join
  • Thread.start
  • Thread.run (updated)

How to Determine the Lengths of the Delays?

The lengths should be random numbers with an upper and a lower bound, e.g. between 50 ms and 1000 ms. I still have to think about how I am going to generate these random numbers. It’s not important that the numbers appear very random, but the generation should be efficient. I guess just using Random.nextInt(int n) is fast enough, but I’ve also considered using a pre-generated array of delay lengths. That way, getting the next random number could be sped up. I don’t think that’s necessary, though.

The seed that is being used for a particular execution should be determined randomly or by the user. If it was chosen randomly (by using the time; Java adds to System.nanoTime()), then it should be printed out or stored somewhere. If errors are found, the user could than reuse that seed to generate the same delays. That would not guarantee that the same sequence of synchronization points is generated again, but it should strongly favor some schedules over others.

I think I can just use Thread.sleep for the delays. It’s a native method and may have an impact on scheduling, i.e. it might immediately switch to another thread, but that’s ok. In fact, it is probably better than a tight for loop that causes the thread to hog the CPU.

When to Delay?

I came up with three different strategies for determining when to delay the program (the section above that lists the synchronization points describes where to delay). These strategies range from simple (and inefficient) to complex (but perhaps more efficient):

  1. Always delay.
  2. Delay if more than one thread is running.
  3. Delay if more than one thread is running and the current thread does not already hold the lock.

Here are descriptions of the three strategies:

Strategy 1: Always delay.

When the program reaches one of the synchronization points and there is a delay (before it, after it, or both before and after it), that delay is always executed. As an example, here is what monitorenter would look like in pseudocode:

sleep(getNextRandomDelay());
monitorenter;
sleep(getNextRandomDelay());

Strategy 2: Delay if more than one thread is running.

If there is only one thread that is currently running, then there is no point in delaying the execution. The program therefore keeps a counter public static volatile long _threadsRunning; in the SyncPointBuffer class. The counter gets changed at the synchronization points in the following way:

  • Thread.start: increment
  • Thread.exit: decrement
  • before monitorenter: decrement
  • after monitorenter: increment
  • monitorexit: no change
  • before Object.wait: decrement
  • after Object.wait: increment
  • before Thread.join: decrement
  • after Thread.join: increment

The random delays are then only executed if the counter indicates there is more than one active thread (i.e. the counter is 0 or 1, depending on the sync point and the location of the delay). As an example, here is monitorenter in pseudocode:

if (\_threadsRunning>1) { sleep(getNextRandomDelay()); }
synchronized(SyncPointBuffer.class) {
\_threadsRunning = \_threadsRunning - 1;
}
monitorenter;
synchronized(SyncPointBuffer.class) {
\_threadsRunning = \_threadsRunning - 1;
}
if (\_threadsRunning>1) { sleep(getNextRandomDelay()); }

Unfortunately, this code will get a little more complicated: Thread.sleep throws a checked InterruptedException which I need to catch, but I can just do that in a wrapper method.

The code to increment and decrement the counter needs to be synchronized because it involves both a read and a write, but I have to make sure the lock is released before the call to sleep, otherwise the delay would not allow other interleavings and the whole exercise would be pointless.

Strategy 3: Delay if more than one thread is running and the current thread does not already hold the lock.

Program execution can be sped up by using strategy 2, but not invoking a delay at a monitorenter if the thread already owned the lock. This often happens in recursive calls, e.g.

void foo(int n) {
if (n>0) {
synchronized(this) {
// ...
foo(n-1);
}
}
}

The synchronized block is entered n times, always with the same object, so the lock only has to be acquired and released once; all the monitorenter instructions except for the first and all monitorexit instructions except for the last are no-ops.

To figure out if a thread already owns a lock, the program has to maintain a set of locks for every thread. To not interfere with garbage collection, it’s probably a good idea to use object IDs and a set of the object IDs of the owned locks.

This requires a lot more effort just for bookkeeping than a simple counter, and object IDs are still a complicated issue. Considering that strategy 2 isn’t a lot harder to implement than strategy 1, but doesn’t require object IDs or abstract data structures, I am going to implement strategy 2. Perhaps I am going to implement strategy 1 first, though.

How to Run the Program?

Except for the seed value for the random number generator, I don’t need to report any data. The important behavior is a simply a failure of a unit test in one of the schedules. I will therefore try to make it possible to run the program in just one VM without an attached monitor program using JPDA. That should make execution a lot more efficient.

This is somewhat of a return to my implementation approach from the beginning of the project and may fail, but I think it’s worth the effort to reevaluate. I’ll try to make the instrumentors compatible with the other instrumentors I have, so a second VM can be used if the user wants to.

For now, I’ll also opt for an offline instrumentation, i.e. I will use the FileInstrumentor program and not a class loader for dynamic instrumentation.

Corky has also suggested only instrumenting user classes and not the entire rt.jar file. For record and replay, I don’t think that was enough, but for this approach it should be sufficient.

Share
Posted in Concurrent Unit Testing | Leave a comment

Print This Post Print This Post  

New DrJava Beta

Today we released a new beta version of DrJava (drjava-beta-20060729-1410). The last beta before that was from June 10 (drjava-beta-20060610-0349), and the current stable release dates back to January 27 (drjava-stable-20060127-2145). Unless we find some bugs that absolutely need to be fixed, we are planning on releasing a new stable version before the start of the next semester on August 28. Until then, we’ve decided to concentrate on fixing bugs and not add new features.

Today’s beta contains a browser-like history that remembers the locations of document switches, jumping to errors, bookmarks, and so on and allows the user to move back and forth between them. I believe this is the only major feature that we have added. The beta also benefits from Corky’s work on removing concurrency problems, especially on dual core machines. I also fixed a lot of bugs, even though most were pretty minor:

I think I also got [1501897] Running program in debug mode resets interactions, which was a very serious problem that pretty much prevented the use of the debugger on my MacBook. The MacOS problems I reported (which I believe had the same problem as the original bug report) are definitely gone, but since I’ve never been able to reproduce the problem, I haven’t closed the bug report yet. Barnaby said he used to be able to reproduce it, but I haven’t heard from him since Monday.

We also investigated [1511641] working directory not quite right yet (though we decided to remove the “working directory” preference because it was misleading), [1505560] Inconsistent use of project root in classpath and [1505515] Javadoc with long classpaths, but we don’t think the requested changes would improve the program.

Along the way, we also did other improvements, of course. This time, only changes by Corky and me made it into the release, but Dan and Barnaby have been working behind the scenes on improving DynamicJava and the reduced model.

As always, there are still some rough edges with DrJava, but I’m pretty happy about this release. I think it’s quite stable.

Share
Posted in DrJava | Leave a comment

Print This Post Print This Post  

Object ID Update

A while ago I analyzed problems with assigning object IDs. Back then, I had the following symptoms:

  1. java.lang.Object: always -1, can’t hold a field
  2. java.lang.String: always -1, can’t hold a field
  3. java.lang.Class: always 0, $$$objectID$$$ field doesn’t get initialized
  4. arrays: always -1

Since then, I have changed the base class implementation to use the identity hashcode instead of -1 to signal if an object ID isn’t available. To distinguish object IDs and identity hashcodes, object IDs are now negative. Let’s look at what the same code generates now:

0 3 // 000030e4 0003 0018   22413802 sun.misc.Launcher$AppClassLoader.loadClass
0 1 // 000030e4 0003 0018   22413802 sun.misc.Launcher$AppClassLoader.loadClass
0 2 // 000030e4 0003 006c   22413802 sun.misc.Launcher$AppClassLoader.loadClass
0 3 // 000031d7 0002 001a   18262862 ObjectIDTest.main
0 1 // 000031d7 0002 001a   18262862 ObjectIDTest.main
0 2 // 000031d7 0002 0047   18262862 ObjectIDTest.main
0 3 // 000030e4 0003 0018   22413802 sun.misc.Launcher$AppClassLoader.loadClass
0 1 // 000030e4 0003 0018   22413802 sun.misc.Launcher$AppClassLoader.loadClass
0 2 // 000030e4 0003 006c   22413802 sun.misc.Launcher$AppClassLoader.loadClass
0 3 // 000031d7 0002 0085   21539363 ObjectIDTest.main
0 1 // 000031d7 0002 0085   21539363 ObjectIDTest.main
0 2 // 000031d7 0002 00b2   21539363 ObjectIDTest.main
0 3 // 000031d7 0002 00ed    2570525 ObjectIDTest.main
0 1 // 000031d7 0002 00ed    2570525 ObjectIDTest.main
0 2 // 000031d7 0002 011a    2570525 ObjectIDTest.main
0 3 // 000031d7 0002 015d   26867996 ObjectIDTest.main
0 1 // 000031d7 0002 015d   26867996 ObjectIDTest.main
0 2 // 000031d7 0002 018a   26867996 ObjectIDTest.main
0 3 // 000030e4 0003 0018   22413802 sun.misc.Launcher$AppClassLoader.loadClass
0 1 // 000030e4 0003 0018   22413802 sun.misc.Launcher$AppClassLoader.loadClass
0 2 // 000030e4 0003 006c   22413802 sun.misc.Launcher$AppClassLoader.loadClass
0 8 // 00003188 0001 0016          0 java.lang.Character.
0 6 // 00003188 0001 0016          0 java.lang.Character.
0 7 // 00003188 0001 005b       -443 java.lang.Character.
0 8 // 00003188 0001 0016          0 java.lang.Character.
0 6 // 00003188 0001 0016          0 java.lang.Character.
0 7 // 00003188 0001 005b       -444 java.lang.Character.
0 3 // 000031d7 0002 01db       -443 ObjectIDTest.main
0 1 // 000031d7 0002 01db       -443 ObjectIDTest.main
0 3 // 000031d7 0002 020d       -444 ObjectIDTest.main
0 1 // 000031d7 0002 020d       -444 ObjectIDTest.main
0 2 // 000031d7 0002 023b       -444 ObjectIDTest.main
0 2 // 000031d7 0002 0275       -443 ObjectIDTest.main
0 5 // 000031aa 0015 0032          0 java.lang.Thread.exit

Now all locks are distinguishable, though not always by using object IDs. The only zeros here in this recording occur when object IDs are just being assigned. I can probably still fine-tune this and let a few more classes have object IDs, but this is a major improvement already.

As the occurrence of of sync points with codes 6, 7 and 8 may already suggest, assigning object and thread IDs now generate different sync points for their monitorenter and monitorexit opcodes (6 through 8 for object IDs, 9 through 11 for thread IDs) , as previously suggested. Another difference is the presence of the TRYMONITORENTER sync point before the monitorenter instruction, which makes deadlock detection possible again, but that has been there for a while.

Share
Posted in Concurrent Unit Testing | Leave a comment

Print This Post Print This Post  

Concutest and JDK 1.6.0

I just installed Beta 93 of the JDK 1.6.0 (“Mustang”), mainly because Corky suggested it for tests with DrJava, but I also ran my file instrumentor and recorded sync points with debug information. From what I can tell now it works without a hitch. Nice…

Share
Posted in Concurrent Unit Testing | Leave a comment

Print This Post Print This Post  

Don’t Touch Apple’s Number

I finished my divide-and-conquer trial-and-error analysis of what Java runtime classes I may not instrument in Apple’s VM, and apparently the problematic class (in addition to java.lang.Object and java.lang.String, which were also restricted on the Sun VM) is java.lang.Number. How weird. What’s special about that class?

Update

Looking back, this question seems silly: Apple probably optimized the representation of Numbers, just like Sun did with Strings.

Share
Posted in Concurrent Unit Testing | Leave a comment

Print This Post Print This Post  

Summer School Wrap-Up

The 12 days of summer school are over already. I had a great time: I learned a few new techniques, realized several times that I’m on the right track, and met many interesting people. Some of them, I’m sure, will accompany my career for a while. A long while, hopefully.

What is there to report since my last posting? I took a day off to fix a number of DrJava bugs. There are still one or two bigger ones left, bu I think we can release a new version within a week. I think just today, I found a rather serious timing bug in the debugger code.

I (kind of) figured out what was going on with the zero-filled sync point logs: For some reason, sometimes when I do Event.toString() on a JDI event, it breaks the system. So I took that out; it was for debug logging anyway. At least I hope this was the reason and not just a weird hack, because I don’t see a reason why calling toString() should break the system in such a way anyway.

I’m done creating headless versions of the FileInstrumentor, Record and the RecordLauncher, and of Replay and the ReplayLauncher. The headless versions are a lot faster.

I didn’t continue my attempts to exactly figure out what is the problem with assignming object IDs on the Mac. I’m pretty sure that adding the $$$objectId$$$ field to some class is the problem, though.

I have realized that there is a fundamental problem with my approach: Whenever I’m putting sync points in the buffer, I’m introducing a new reference to a class. That means that at some point this class will have to be loaded by the class loader, and this introduces sync points that previously weren’t there. This is a chicken-and-the-egg problem and even resembles the Uncertainty Principle a little: In order to observe the sequence of synchronization points, I am forced to change the sequence, which prevents me from observing the unaltered sequence. Not putting in my own classes preserves the sequence of sync points, but I don’t have the means to observe it.

I don’t think this problem will be as bad as it sounds, though, because in practice, my new classes should be loaded very early on. That means that 1) the sync points occur during the VM initialization, and we don’t consider the sync points in that phase important anyway, and 2) it should be relatively easy to recognize the additional sync points so they can be filtered out. I’m a litte more worried that the class loader that does the instrumentation will also introduce similar sync points throughout the program’s execution, though.

I have also realized that the monitorenter and monitorexit instructions in the code that assigns object and thread IDs do need to be recorded and replayed in the correct order (otherwise we end up with different IDs, and then there’s no point in having IDs at all), but since these instructions were added by the instrumentation, they should not be treated the same as other monitorenter and monitorexit sync points. I’ll introduce special codes. (Update: I have also just realized that the program tries to pass the object and thread IDs for these monitorenter and monitorexit instructions even though they clearly have not been assigned yet, since the synchronization is protecting just those assignments! I should definitely use special codes.)

I have thought a little more about how to use the identity hashcode as a fallback. That value is a 32-bit int, and at least so far it always seems to be non-negative. The easiest way to put both an object ID and the hashcode in the same long is probably to use the unmodified identity hashcode (0, 1, 2, …) and negative object IDs (-1, -2, -3, …). That gives me 2^63 unique object IDs.

It’s not too bad if the object IDs wrap around and I start getting the same ones again (even though they should probably wrap from from -(2^63) back to -1): The deadlock detector may just issue more false positives, but those can be discerned as such later. The program will just run slower.

I still haven’t finished tying the deadlock detector to the compact scheme. I think I’ll have to redo more of it than I thought. Right now, it’s very heavily dependent on the “heavy” object scheme that I initially used. I’ll still do a detailed analysis at the object level using ThreadReferences, but only to remove false positives when the compact scheme has already detected a deadlock.

I think that’s my recap for tonight. Good night.

Share
Posted in Concurrent Unit Testing, DrJava, Research | Leave a comment

Print This Post Print This Post  

Apple Records

I just managed to make my first sync point recording on the MacBook. I had to remove the instrumentation strategy that assigns object IDs and remove the calls to get those IDs in the strategy for synchronized blocks.

The problem could be in either the place that assigns the object ID, the place that queries it, or perhaps also just in one special class. For Windows, I’ve also had to exclude a few classes, so that might be the case for the Mac too.

I also talked to Cormac Flanagan again today. He had sent me a paper draft yesterday that outlined a technique similar to our instrumentation for model checking. The draft, unfortunately, didn’t answer my questions. He said, though, that he had also used a flag to prevent internal logic from changing the behavior of the program. Another alternative was to use two copies of Java runtime classes, one copy that is instrumented and used by the user program, and one that isn’t for the recording software.

Update

It seems like the problem is either adding the overridden $$$getObjectId$$$ method or the $$$objectId$$$ field to one or several classes that are somehow “touchy” in Apple’s VM, because adding the default method to java.lang.Object and calling it at the sync points works.

Share
Posted in Concurrent Unit Testing | Leave a comment

Print This Post Print This Post  

More Nightly Work

I’m again trying to squeeze in some work on my project during the night here at the summer school. I got up at 7 AM, showered and had breakfast. We had the first set of lectures from 9 AM to about 11:30, then from 2 to 5. At 8 PM, we walked for about two hours. Inbetween, I have managed to call my girlfriend (in case you ever read this: I love you, sweetheart) and do a few debugger runs, although I haven’t found out very much. I also haven’t touched the Apple VM again yet, I was more concerned about the recording not working (recording a bunch of zeros) on the PC, the headless version, and the recording program hanging after it is done.

It actually seems like the zeros and the headless version are connected. When I am launching the headless version from the GUI launcher, sometimes the correct synchronization points are reported. Maybe it’s a speed issue, and the headless version executes too fast. I have no idea why that would be, though, but only the first few sync points are concerned. After that, the schedules match again.

At least I think I have seen the problem with the program deadlocking at the end: I think it’s the same problem we have seen with DrJava on tablet PCs. Using Runtime.halt instead of System.exit allows the program to quit. Why? I have no idea. And the headless version is working too, except for the zeros. The RecordLauncher itself isn’t truly headless yet, it just doesn’t make its GUI visible. That’s something I still have to do.

It’s way past bedtime already, and I was supposed to read a paper and prepare a bunch of questions for Cormac Flanagan tomorrow. He has done some work with model checking concurrent Java programs using instrumentation and dynamic partial order reduction make the state space smaller. I hope I’m able to think at all tomorrow today after four three hours of sleep.

Share
Posted in Concurrent Unit Testing, Uncategorized | Leave a comment

Print This Post Print This Post  

Quick Update

During the last few days, I’ve been trying to accomplish three things:

  1. Hook the new compact code up to the old deadlock detector
  2. Write a headless version of the recorder so it can be run over an X-less SSH connection
  3. Make the instrumentation process work on Macs

Unfortunately, it seems like while trying to work on 2, I must have screwed something up, and recording wasn’t working at all anymore, on no system, not only on Macs.

Now I’m at the Summer School on Language-Based Techniques for Concurrent and Distributed Software in Eugene, Oregon, and in the few remaining hours I couldn’t track down what change had broken recording. I was only able to roll back before I started to work on the headless version to make it work.

I have taken both my new MacBook and the Windows tablet with me to the summer school, but I only had the MacBook in my carry-on luggage. So at the airport and on the plane I’ve been working on adding the ability to have multiple input files, which seems to be what’s required to make the process work on Macs. Hopefully I’ll just be able to combine both classes.jar and ui.jar in one single file and prefix that to the boot classpath.

I’m going to test that in a minute. Of course, then I still have to test whether the changes run on Windows. And on Linux… ;)

Update

The changes to support multiple input files work on Windows as well, so I committed them. Interestingly, on the Windows tablet, the problems I had noticed earlier with recording didn’t occur. Maybe my Windows system at home was just screwed up.

I also launched the recorder on the Mac for the first time. It looked promising, and I got a bunch of good recorded sync points, complete with object ID information, but then the Java program died with the message

 Invalid memory access of location 000002a6 eip=ffff07c2

I’m going to sleep now. It’s nearly 3 AM, Houston time… I guess soon I’ll have to play divide and conquer on the Mac to figure out what might be causing that access exception.

Share
Posted in Concurrent Unit Testing, Uncategorized | Leave a comment

Print This Post Print This Post  

Shower Ideas

I have realized that, as long as the bootclass mechanism with the -Xbootclasspath/p:foo.jar parameter functions the same way with the Apple JVM as with the Sun JVM, I don’t have to instrument Apple’s classes.jar and ui.jar into separate jar files. I should still be able to combine both together into a jar file like I’ve been doing it so far. That means I only have to support multiple input files for Java runtime file instrumentation (“classes.jar:ui.jar” instead of just “rt.jar”)

I digress, but the I have a naming convention for the instrumented rt.jar files, depending on whether I’m recording or replaying, and whether it uses the compact format or the compact debug format:

  1. record.jar – recording
  2. rt.record.debug.jar – recording with debug information
  3. rt.replay.jar – replaying
  4. rt.replay.debug.jar – replaying with debug info

I’m not sure if I’ve described the compact format recently. Both use an array of longs. The smallest, most efficient format has records with two longs per sync point:

  1. thread ID – 0 or greater
  2. code – 1= SP_MONITORENTER, 2 = SP_MONITOREXIT, 3 = SP_TRYMONITORENTER, 4 = SP_THREADSTART, 5 = SP_THREADEXIT, 6 = SP_END

This is the bare minimum needed for replay.

The debug format currently needs five longs per sync point:

  1. thread ID – 0 or greater
  2. code – 1= SP_MONITORENTER, 2 = SP_MONITOREXIT, 3 = SP_TRYMONITORENTER, 4 = SP_THREADSTART, 5 = SP_THREADEXIT, 6 = SP_END
  3. class index – 1 or greater, assigned by the FileInstrumentor
  4. method index and program counter – the method index is in the upper 32 bit and is assigned by the FileInstrumentor; the PC is in the lowest 16 bit
  5. object ID – -1 if it cannot be determined, 0 if it hasn’t been initialized, 1 or greater for valid IDs

The debug format is useful for me to understand the sync points that have been recorded, because it tells me where a sync point originated from and (at least if my current work with the object IDs will be successful) what objects are involved. If I know what objects are involved, I can detect deadlock.

When calling SyncPointBuffer.compactDebugAdd, the object ID is actually the first parameter. It worked out better this way; otherwise I would have had to use a local variable to rearrange the stack.

Share
Posted in Concurrent Unit Testing | Leave a comment

Print This Post Print This Post  

The Mac, The Mac…

Somehow Apple had problems getting Corky’s order of his MacBook straight: First they had lost the order and he had to order again, then they apparently found it again, and Corky ended up with two MacBooks. Since the MacBook has a dual-core CPU, he asked if I wanted to work on it. I am blessed. Thank you.

It’s a very slick machine. I’m still getting used to MacOS and several of the Apple-specific things like mounting archives, and so on, but I feel that I could really like Apple’s blend of graphical OS and Unix-like internals. I’m in the process of installing all the things I need, like Perforce for my one-user mini repository, Ant, Subversion, etc… Most of it is working already. I don’t have a decent office suite yet, though.

Unfortunately, many things are really different. It took me a long time to find something that resembled a JDK. For the Mac, Apple makes the JVM, so it’s neither in a place that looks normal to me, nor can I just go and download a stock JVM from Sun. And I thought Sun was brewing it’s own strange blend of coffee…

Apparently, the rt.jar file is split into two parts on the Mac, classes.jar and ui.jar:

  1. /System/Library/Frameworks/JavaVM.framework/Versions/1.5.0/Classes/classes.jar
  2. /System/Library/Frameworks/JavaVM.framework/Versions/1.5.0/Classes/ui.jar

I may have to extend the capabilities of the FileInstrumentor a little to handle that, even though it should already be able to handle it: I’d just have to treat one of the jars as a normal input file, and it won’t be auto-detected.

It’s just a little bit sad that the differences are so great that I’ll have to adapt my project to the MacOS platform, when I haven’t even finished adapting and testing it on Linux, and I’m in such an awful time crunch right now since I’m leaving for Oregon next Tuesday.

Share
Posted in Concurrent Unit Testing, Uncategorized | Leave a comment

Print This Post Print This Post  

Identity Hash Code as Object ID

At our meeting today, Corky again reminded me that it is ok to just come up with an approximation, and then later, in case of a false positive, check if a deadlock has actually occurred. That may make it possible to use System.identityHashCode(Object x) for the cases in which I currently can’t assign a unique ID. The identity hash codes of objects are not guaranteed to be unique, but most of the time, this is actually true. It is guaranteed that the identity hash code remains the same for an object’s entire lifetime, though. As a result, if identity hash codes are used, I will always be able to detect a deadlock if one has occurred; if two distinct objects happen to have the same hash codes, then a deadlock might be reported even if there isn’t one. The detector might have false positives.

I should be able to recognize a false positive on the master side, though, simply by checking if the thread that closed the apparent cycle in the wait graph at the SP\_TRYMONITORENTER sync point was able to continue past the monitorenter instruction and reached the SP\_MONITORENTER sync point. If that was the case, then a false positive was reported.

The JDI ThreadReference class also has two methods, public ObjectReference currentContendedMonitor() and public List<ObjectReference> ownedMonitors(), and I could just let the program continue for a few milliseconds after detecting a possible deadlock when a thread reaches the SP\_TRYMONITORENTER sync point, and then see if the thread’s current contended monitor is among the owned monitors of the other threads involved in the potential deadlock.

With identity hash codes, I’ll have the same problem as with using the class index as object ID that I had described earlier: If I use the identity hash code, I have to distinguish those from actual unique IDs. The identity hash code is only an int (and seems to always be non-negative), though, so it could be mapped into the negative range of the object ID, which is a long. I cannot use the identity hash code and the class index at the same time, but the identity hash code seems to be the better option anyway,

So I guess I’ll replace the implementation of java.lang.Object.$$$getObjectID$$$ from returning -1 to returning a bit-twiddled System.identityHashCode(this).

Share
Posted in Concurrent Unit Testing | Leave a comment

Print This Post Print This Post  

Object IDs for Class Objects

I have realized that it is of crucial importance to get this scheme to work with a class’ java.lang.Class object, because that is what static synchronized methods use for synchronization. However, like I said before, I can probably overload the $$$getObjectID$$$ method for java.lang.Class and somehow use the class index that I’m generating for the method database.

The first problem is that I need to distinguish a class index from a normal object ID. I can probably do that by simply negating them and subtracting one: -1 is the code for an unavailable object ID, so class index 1 would correspond to object ID -2, and so on.

Right now, I don’t yet know how to even get to that class index, though. And I don’t know if there’s a difference between a class’ Class object, like Integer.class or one that gets created later for reflection.

Much to ponder, but at progress at last/least.

Share
Posted in Concurrent Unit Testing | Leave a comment

Print This Post Print This Post