Thursday, January 8, 2009

Home-made Torque monitoring

I've always been frustrated by the tools for finding out what's going on with Torque/Maui. In particular, it's hard to get an overview of the cluster state. So I compiled up pbs_python and wrote a little web CGI application to provide the information I was interested in. It shows information on jobs running on each cluster node: owner, efficiency, memory usage. It colour-codes the details: grey for under-utilisation and red for over-utilisation. Not perfect but useful for me.

It's available at http://grid.ie/distribution/clustermon

P.S. if something better exists out there, I'd be very interested in hearing about it. I've never found anything that does quite what I want.