I’ve got to admit: Python is pretty cool when it comes to quickly writing powerful scripts. I wanted to extract the number of all-time downloads from DrJava’s SourceForge statistics page , but it wasn’t on the same line as the “Total” word, so a simple sed one-liner wasn’t enough.
Of course I could have written it in Java, but that would have involved compiling it and having class files in addition to the Java file. I probably could have written it as a bash script, but to be honest, bash is pretty clunky. Python did the job easily and well.
from array import array
filehandle = urllib.urlopen('http://sourceforge.net/project/stats/detail.php?group_id=44253&ugn=drjava&type=prdownload&mode=alltime&file_id=0')
found = False
for lines in filehandle.readlines():
text = lines.strip()
p = re.compile(r'<.*?>')
text = p.sub('', text)
# p = re.compile(r',')
# text = p.sub('', text)
if lines.find('Total') != -1:
found = True
There are probably better, more elegant ways of doing this, but Python is one of those languages that I use but never learned, just like Perl or PHP. Maybe it’s a “P” thing. No, it isn’t, I actually learned Pascal at my Gymnasium (German secondary school)  and at university in Germany .
Anyway, using this script I have now integrated a download counter on the DrJava website  that gets updated every midnight. We’re seriously getting close to a million, faster than I expected . This is probably because of the new DrJava beta version  we released.
With less than 5,000 downloads to go, we might hit the million early in May already!