Tuesday, March 11, 2008

But my proxy hasn't expired!

We have been plagued with a frustrating problem (especially in our test environment). Users would generate a new proxy, submit a job immediately and then get an error like this:


[childss@ui childss]$ edg-job-status https://cagraidsvr18.cs.tcd.ie:9000/nbPfABOjQHsG7IcFCJcYLg


*************************************************************
BOOKKEEPING INFORMATION:

Status info for the Job : https://cagraidsvr18.cs.tcd.ie:9000/nbPfABOjQHsG7IcFCJcYLg
Current Status: Aborted
Status Reason: Job proxy is expired.
Destination: gridgate02.testgrid.:2119/jobmanager-lcgpbs-test
reached on: Tue Mar 11 08:47:31 2008
*************************************************************


Which is very annoying as the proxy obviously hasn't expired. It turns out that this is due to old jobs stuck on the RB (whose proxies have expired). The problem can be cleared by logging onto the RB, identifying old jobs for the user's DN and removing them using condor_rm. I'll leave it to someone else to explain why this arises. I hope it's been fixed in the new WMS.