On Friday, before and during COMP 312, the production programming class that drives a lot of the development of DrJava, we discussed the reasons for the apparent slowdown of DrJava. At the beginning, I quite randomly mentioned that the reduced model was in list form and therefore wasn’t set up to handle more complicated requests, such as what class the cursor was in. Naturally it wasn’t set up to do that, that wasn’t on the programmers’ minds when the code was written, and it performs quite well for syntax highlighting, the main purpose, and pretty much acceptably for indenting. I just thought out loud if Barnaby’s attempt to write a tree-shaped reduced model would help satisfy those more complicated queries.
Corky vigorously defended the list-shaped reduced model. I’m not quite sure why, because I wasn’t even attacking it. Then he mentioned the slowdowns of DrJava that he had noticed, compared to about a year ago. A year ago was approximately when I started making major changes to the DrJava codebase and adding features like “Go to File”, persistent breakpoints and bookmarks, “Open Javadoc”, a simple form of code completion, a clipboard history, and “Find All”, all things that I’m sure absolutely nobody wants to miss anymore. Most of these features were driven by my own desire to use DrJava on bigger projects, and I always found something missing; unfortunately, I’m quite sure most people don’t even know about all the things I implemented, even though I documented them quite thoroughly (the first documentation update in a year and a half).
At the same time I had to admit that Corky was right: Opening the DrJava project took about three times as long with a recent version than with a version from January 2006, so something was definitely going on, although I was pretty sure that my changes had nothing to do with it. Sure, many of my changes, especially “Find All”, were not implemented in the best way, but they were implemented solidly with what the DrJava source code gave me. I added a few parts with extremely high code reuse, like the “predictive input dialog” and regions, then I recombined these new things with what was already there. To make “Find All” work, I have to cycle through all documents that contained matches, and this is actually visible in the title bar; however, now there’s a very convenient “Find All” feature (kind of like grep), before there wasn’t, and my code doesn’t do anything that a manual search by repeatedly pressing “Find Next” wouldn’t have done. Well, it does… it highlights the matches, if desired.
I felt like I wanted to prove that my changes had absolutely nothing to do with the slowdown, so I started playing around with the YourKit Java Profiler, a new and fantastic toy that we may use free of charge for our open source project. Just before Corky left for the weekend, I was able to demonstrate that one of the slow parts of DrJava, “Go to Line”, was due to horrible code that abused the list-shaped reduced model: It set the cursor of the reduced model to the beginning, and then moved it, counting newlines. Setting the line seems like a simple setter that should have constant runtime, but this algorithm was actually executing linearly! I promised I would rewrite the code by using Swing information.
When I had done that, I did some more profiling and received puzzling information: While the actual “Go to Line” operation was faster, something else took absolutely forever afterwards; with the old code, it didn’t, probably because something was cached. After some more profiling, I identified that the getter used to update the current line number in the status bar was the culprit. It was written in the same way “Go to Line” was written, and this code was being executed every time the cursor was moved. Absolutely horrible code, and the cause was abuse of the reduced model.
I had never claimed that the reduced model was bad, but I had stated that it wasn’t suited for more complicated queries. Unknowingly we were making unnecessarily complicated queries for “Go to Line” and to find the current line number, we were abusing the reduced model, and totally killing our performance.
By the time I left around 10 PM, I was relieved that my code actually had nothing to do with the slowdown, but I was a bit worried that my lucky guess about the reduced model made me sound “too right”. I was actually pretty worked up over this. I didn’t want to rub anyone’s nose in this, I just made a lucky guess, but I didn’t know how my statements and findings would be interpreted. I went to the lab to do some more profiling and perhaps fix the problems on Sunday, but Corky had already done all the work.
Today at the lab meeting, he lauded my good work profiling and how it helped him fix the problems, and said each and every member of the DrJava team should learn how to use YourKit. I agree, it’s a fantastic tool. I wrote a long email today that described what I had learned about it, but of course, knowing human nature (at least if it is anywhere close to my human nature), I doubt many people actually went through it and tried to use YourKit.
Here’s the email I sent out, just to share my small slice of knowledge with the world once more:
Hi, I installed YourKit into the javaplt account on CSnet. Here are instructions if you want to use it. Please note that this is for the Linux environment on CSnet. For Windows and MacOS, or away from CSnet, these steps don't apply. 1. The directory YourKit is in is /home/javaplt/packages/yjp-6.0.11 2. Please set the following environment variable, probably in your .bashrc file: setenv LD_LIBRARY_PATH /home/javaplt/packages/yjp-6.0.11/bin/linux-x86-32 3. To start YourKit, run the following script from the command line: /home/javaplt/packages/yjp-6.0.11/bin/yjp.sh & The & at the end will make YourKit run in the background, i.e. you can still use the same command line window you've been using. The first time you start YourKit, it should ask you for your license information as explained in the previous email. 4. You need to pass a special command line argument to your java executable when you're running a program that you want to profile: -agentlib:yjpagent To make that easier to remember, I have written a script that passes that arguments, and all the argument passed to the script, to java. It's located in: /home/javaplt/packages/yjp-6.0.11/bin/yjpjava So if you want to use it to run a program that you would normally start with: java MyClass just use: /home/javaplt/packages/yjp-6.0.11/bin/yjpjava MyClass Here is a more complicated example to start DrJava. Instead of writing: java -cp drjava-15.jar edu.rice.cs.drjava.DrJava now write: /home/javaplt/packages/yjp-6.0.11/bin/yjpjava -cp drjava-15.jar edu.rice.cs.drjava.DrJava You may want to include the bin directory in your path or make symlinks to yjp.sh and yjpjava. 5. Note that even though YourKit is running AND you have started your Java program now, the two aren't automatically linked. To do that, click on "Connect to locally running profiled application..." in YourKit's "Welcome" screen. That should open a new pane that displays CPU and memory usage graphs. 6. When you actually want to record usage times and counts, press the "Start CPU Profiling" button. That will open another pane. There are two selections, "Sampling" and "Tracing", and as far as I know, "Sampling" records times only and doesn't give you invocation counts and call stacks, but is faster. "Tracing" is a lot slower but gives you more information. 7. When you want to view the data, i.e. when you're done performing the action you want to profile, press the "Capture Snapshot" button. That will bring up a dialog that tells you the file name as which the snapshot has been saved, and it asks you whether you want to open the snapshot. You probably want to press "Open" if you want to look at the data right now. 8. Note that taking a snapshot stops profiling, so if you want to continue to profile, you need to press the "Start CPU Profiling" button again. 9. On the page that displays CPU snapshot informations, there are lots of ways to gather information. I haven't explored everything. I found selecting "Method List" on the left the most useful. That will give you a lsit of all methods that were called, and you can sort by name, time, own time, and invocation count (if you used tracing). Time includes time spent in calls to other methods, i.e. the time from method entry to method exit. Own time is only the time spent in the actual method, i.e. the time from method entry to method exit EXCLUDING the time spent in other methods that were called. Usually I found a very sharp break between the times, i.e. going from 40 seconds in some methods (that were probably in a call chain) to 5 seconds. My intuition tells me that the last method to have a lot of time spent is one of the "hot spots". You should also look at the invocation count: The program may spend a lot of time in a method, but if it's called very often, that may not mean very much. This should really only be a guide to get you started. I'm far from being an expert in using YourKit, so please play with it and learn by using it. If you find something useful, perhaps you can let the rest of us know. If there are any problems, please let me know. Please also take into account that we only have licenses for development on DrJava; if we need additional licenses for DrJava developers, let me know and I will request them. If you want to get free licenses for other open source projects, please contact the YourKit licensing people directly. --Mathias
I really think DrJava is a lot better now than it was in January 2006. I enjoy using it. It has features it desperately needed for big-time development, such as developing DrJava in DrJava, our very mantra. Now that some of the problems that had nothing to do with my additions have been removed, it is even better.