I’ve got to admit: Python is pretty cool when it comes to quickly writing powerful scripts. I wanted to extract the number of all-time downloads from DrJava’s SourceForge statistics page, but it wasn’t on the same line as the “Total” word, so a simple sed one-liner wasn’t enough.
Of course I could have written it in Java, but that would have involved compiling it and having class files in addition to the Java file. I probably could have written it as a bash script, but to be honest, bash is pretty clunky. Python did the job easily and well.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | import shutil import os import time import datetime import math import urllib import re from array import array filehandle = urllib.urlopen('http://sourceforge.net/project/stats/detail.php?group_id=44253&ugn=drjava&type=prdownload&mode=alltime&file_id=0') found = False for lines in filehandle.readlines(): if found: text = lines.strip() p = re.compile(r'<.*?>') text = p.sub('', text) # p = re.compile(r',') # text = p.sub('', text) print text break if lines.find('Total') != -1: found = True filehandle.close() |
There are probably better, more elegant ways of doing this, but Python is one of those languages that I use but never learned, just like Perl or PHP. Maybe it’s a “P” thing. No, it isn’t, I actually learned Pascal at my Gymnasium (German secondary school) and at university in Germany.
Anyway, using this script I have now integrated a download counter on the DrJava website that gets updated every midnight. We’re seriously getting close to a million, faster than I expected. This is probably because of the new DrJava beta version we released.
With less than 5,000 downloads to go, we might hit the million early in May already!