<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-7250102906278505005</id><updated>2011-07-08T16:08:02.499+01:00</updated><title type='text'>Grid-Ireland Operations</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://gridirelandops.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://gridirelandops.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>DOC</name><uri>http://www.blogger.com/profile/05284063768447850150</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>12</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-7250102906278505005.post-754178351822621395</id><published>2009-06-18T10:08:00.004+01:00</published><updated>2009-06-18T10:19:16.377+01:00</updated><title type='text'>DPM 1.7.0 upgrade</title><content type='html'>I took advantage of a downtime to upgrade our DPM server. We need the upgrade as we want to move files around using dpm-drain and don't want to lose space token associations. As we don't use YAIM I had to run the upgrade script manually, but it wasn't too difficult. Something like this should work (after putting the password in a suitable file):&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;./dpm_db_310_to_320 --db-vendor MySQL --db $DPM_HOST  --user dpmmgr --pwd-file /tmp/dpm-password --dpm-db dpm_db&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;I discovered a few things to watch out for along the way though. Here's my checklist:&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;Make sure you have enough space on your system disk: I got bitten by this on a test server. The upgrade script needs a good chunk of space (comparable to that already used by the MySQL DB?) to perform the upgrade&lt;br /&gt;&lt;li&gt;There's a mysql setting you probably need to tweak first: add &lt;tt&gt;set-variable=innodb_buffer_pool_size=256M&lt;/tt&gt; to the &lt;tt&gt;[mysqld]&lt;/tt&gt; section in &lt;tt&gt;/etc/mysql.conf&lt;/tt&gt; and restart mysql. Otherwise you get this cryptic error:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;Thu Jun 18 09:02:30 2009 : Starting to update the DPNS/DPM database.&lt;br /&gt;Please wait...&lt;br /&gt;failed to query and/or update the DPM database : DBD::mysql::db do failed: The total number of locks exceeds the lock table size at UpdateDpmDatabase.pm line 19.&lt;br /&gt;Issuing rollback() for database handle being DESTROY'd without explicit disconnect().&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Also worth noting is that if this happens to you, when you try to re-run the script (or YAIM) you will get this error:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;failed to query and/or update the DPM database : DBD::mysql::db do failed: Duplicate column name 'r_uid' at UpdateDpmDatabase.pm line 18.&lt;br /&gt;Issuing rollback() for database handle being DESTROY'd without explicit disconnect().&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;This is because the script has already done this step. You need to edit &lt;tt&gt;/opt/lcg/share/DPM/dpm-db-310-to-320/UpdateDpmDatabase.pm&lt;/tt&gt; and comment out this line:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;  $dbh_dpm-&gt;do ("ALTER TABLE dpm_get_filereq ADD r_uid INTEGER");&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;You should then be able to run the script to completion.&lt;br /&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7250102906278505005-754178351822621395?l=gridirelandops.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gridirelandops.blogspot.com/feeds/754178351822621395/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7250102906278505005&amp;postID=754178351822621395' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/754178351822621395'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/754178351822621395'/><link rel='alternate' type='text/html' href='http://gridirelandops.blogspot.com/2009/06/dpm-170-upgrade.html' title='DPM 1.7.0 upgrade'/><author><name>Stephen Childs</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7250102906278505005.post-2401454186162631617</id><published>2009-06-08T13:41:00.002+01:00</published><updated>2009-06-08T13:53:48.538+01:00</updated><title type='text'>STEP '09 discoveries</title><content type='html'>ATLAS have been giving our site a good thrashing over the past week, which has helped us shake out a number of issues with our setup. Here's some of what we've learned.&lt;br /&gt;&lt;h3&gt;Intel 10G cards don't work well with SL4 kernels&lt;/h3&gt;We're currently upgrading our networking to 10G and had it mostly in place by the time STEP'09 started. However, we discovered that the stock SL4 kernel (2.6.9) doesn't support the ixgbe 10G driver very well. It was hard to detect because we could get reasonable transmit performance but receive was limited to 30Mbit/s! It's basically an issue with interrupts (MSI-X and multi-queue weren't enabled). I compiled up a 2.6.18 SL5 kernel for SL4 and that works like a charm (once you've installed it using --nodeps).&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;It's worth tuning RFIO&lt;/h3&gt;We had loads of atlas analysis jobs pulling data from the SE and they were managing to saturate the read performance of our disk array. See &lt;a href="http://northgrid-tech.blogspot.com/2008/12/rfio-tuning-for-atlas-analysis-jobs.html"&gt;this NorthGrid post&lt;/a&gt; for solutions.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Fair-shares don't work too well if someone stuffs your queues&lt;/h3&gt;We'd set up shares for the various different atlas sub-groups but the generic analysis jobs submitted via ganga were getting to use much more time. On digging deeper with Maui's &lt;tt&gt;diagnose -p&lt;/tt&gt; I could see that the length of time they'd been queued was overriding the priority due to fairshare. I was able to fix this by increasing the value of FSWEIGHT in Maui's config file.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;You need to spread VOs over disk servers&lt;/h3&gt;We had a nice tidy setup where all the ATLAS filesystems were on one DPM disk server. Of course this then got hammered ... we're now trying to spread out the data across multiple servers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7250102906278505005-2401454186162631617?l=gridirelandops.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gridirelandops.blogspot.com/feeds/2401454186162631617/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7250102906278505005&amp;postID=2401454186162631617' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/2401454186162631617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/2401454186162631617'/><link rel='alternate' type='text/html' href='http://gridirelandops.blogspot.com/2009/06/step-09-discoveries.html' title='STEP &apos;09 discoveries'/><author><name>Stephen Childs</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7250102906278505005.post-7570578662235378460</id><published>2009-03-05T14:18:00.001Z</published><updated>2009-03-05T14:18:46.313Z</updated><title type='text'>Another day, another globus error</title><content type='html'>After almost 5 years at this lark, I thought I'd got a handle on most of the cryptic globus errors. However, today atlas production jobs started failing with errors like this:&lt;br /&gt;&lt;span style="font-family:monospace;"&gt;&lt;br /&gt;018 (9163559.001.000) 03/05 11:25:27 Globus job submission failed!&lt;br /&gt;&lt;pre&gt;  Reason: 22 the job manager failed to create an internal script argument file&lt;br /&gt;&lt;/pre&gt;&lt;/span&gt;Google didn't provide any help, but after asking on LCG rollout, it looked like the problem was the number of files in the relevant user's account. This turned out to be because the script /opt/lcg/sbin/cleanup-grid-accounts.sh that cleans up the grid accounts hadn't run in some days and there were almost 32000 files under that directory.&lt;br /&gt;&lt;br /&gt;So there's yet another vital cog in the grid wheel that can fail fairly silently and cause inexplicable errors! Time to add a nagios sensor to check that this cron job runs successfully every night ...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7250102906278505005-7570578662235378460?l=gridirelandops.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gridirelandops.blogspot.com/feeds/7570578662235378460/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7250102906278505005&amp;postID=7570578662235378460' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/7570578662235378460'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/7570578662235378460'/><link rel='alternate' type='text/html' href='http://gridirelandops.blogspot.com/2009/03/another-day-another-globus-error.html' title='Another day, another globus error'/><author><name>Stephen Childs</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7250102906278505005.post-5163668986716085164</id><published>2009-01-08T16:15:00.004Z</published><updated>2009-01-09T13:56:48.260Z</updated><title type='text'>Home-made Torque monitoring</title><content type='html'>I've always been frustrated by the tools for finding out what's going on with Torque/Maui. In particular, it's hard to get an overview of the cluster state. So I compiled up &lt;a href="https://subtrac.sara.nl/oss/pbs_python"&gt;pbs_python&lt;/a&gt; and wrote a little web CGI application to provide the information I was interested in. It shows information on jobs running on each cluster node: owner, efficiency, memory usage. It colour-codes the details: grey for under-utilisation and red for over-utilisation. Not perfect but useful for me.&lt;br /&gt;&lt;br /&gt;It's available at &lt;a href="http://grid.ie/distribution/clustermon"&gt;http://grid.ie/distribution/clustermon&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;P.S. if something better exists out there, I'd be very interested in hearing about it. I've never found anything that does quite what I want.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_DMJNZnUX1Z4/SWdXZ8JNPoI/AAAAAAAAAAU/610o5qCnfs8/s1600-h/clustermon.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 320px; height: 222px;" src="http://2.bp.blogspot.com/_DMJNZnUX1Z4/SWdXZ8JNPoI/AAAAAAAAAAU/610o5qCnfs8/s320/clustermon.png" alt="" id="BLOGGER_PHOTO_ID_5289292390523027074" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7250102906278505005-5163668986716085164?l=gridirelandops.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gridirelandops.blogspot.com/feeds/5163668986716085164/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7250102906278505005&amp;postID=5163668986716085164' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/5163668986716085164'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/5163668986716085164'/><link rel='alternate' type='text/html' href='http://gridirelandops.blogspot.com/2009/01/home-made-torque-monitoring.html' title='Home-made Torque monitoring'/><author><name>Stephen Childs</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_DMJNZnUX1Z4/SWdXZ8JNPoI/AAAAAAAAAAU/610o5qCnfs8/s72-c/clustermon.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7250102906278505005.post-3377932139499038899</id><published>2008-12-04T10:08:00.003Z</published><updated>2008-12-04T10:18:34.690Z</updated><title type='text'>Cron Security</title><content type='html'>After the recent Security Challenge we became aware that any pool user could create &lt;em&gt;at&lt;/em&gt; and &lt;em&gt;cron&lt;/em&gt; jobs on our cluster: obviously not good for security or scheduling.&lt;br /&gt;&lt;br /&gt;Initially we wondered if we'd need to create SELinux policies to restrict this but it's much simpler than that. &lt;em&gt;Cron&lt;/em&gt; and &lt;em&gt;at&lt;/em&gt; support simple allow and deny files to control which users can use the commands.&lt;code&gt; /etc/cron.deny&lt;/code&gt; specifies which users are denied access, and&lt;code&gt; /etc/cron.allow&lt;/code&gt; specifies which users are allowed. (For full details&lt;code&gt; man crontab&lt;/code&gt;.)&lt;br /&gt;&lt;br /&gt;In &lt;code&gt;/etc/cron.deny&lt;/code&gt; we put:&lt;br /&gt;&lt;pre&gt;   ALL&lt;br /&gt;&lt;/pre&gt;and in &lt;code&gt;/etc/cron.allow&lt;/code&gt; we put:&lt;br /&gt;&lt;pre&gt;   root&lt;br /&gt;   admina&lt;br /&gt;   adminb&lt;br /&gt;   ...&lt;br /&gt;&lt;/pre&gt;where &lt;code&gt;admina&lt;/code&gt;, &lt;code&gt;adminb&lt;/code&gt; and so on are the admin users who should have cron access. &lt;code&gt;/etc/at.deny&lt;/code&gt; and &lt;code&gt;/etc/at.allow&lt;/code&gt; are configured the same way.&lt;br /&gt;&lt;br /&gt;This is configured through &lt;a href="http://quattor.org/"&gt;Quattor&lt;/a&gt;. For now we're using the &lt;em&gt;filecopy&lt;/em&gt; component to install the config files, but this might be a useful extension to the &lt;em&gt;cron&lt;/em&gt; component.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7250102906278505005-3377932139499038899?l=gridirelandops.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gridirelandops.blogspot.com/feeds/3377932139499038899/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7250102906278505005&amp;postID=3377932139499038899' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/3377932139499038899'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/3377932139499038899'/><link rel='alternate' type='text/html' href='http://gridirelandops.blogspot.com/2008/12/cron-security.html' title='Cron Security'/><author><name>DOC</name><uri>http://www.blogger.com/profile/05284063768447850150</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7250102906278505005.post-6841158121082307155</id><published>2008-09-11T08:43:00.002+01:00</published><updated>2008-09-11T08:53:11.268+01:00</updated><title type='text'>LHC switch-on in Ireland</title><content type='html'>We had a great day yesterday at Trinity's &lt;a href="http://www.sciencegallery.ie/"&gt;Science Gallery&lt;/a&gt; where we had a live feed from CERN running all day. There was a lot of press interest and the grid featured heavily due to the fact that the &lt;a href="http://grid.ie/opscentre.html"&gt;grid group here at TCD&lt;/a&gt; makes up half of Ireland's LHC involvement (the other half being the &lt;a href="http://www.ucd.ie/physics/lhcb/index.html"&gt;particle physics group at UCD&lt;/a&gt; who are in LHCb). We had the GridPP real-time monitor running all day, which provoked a lot of interest and made it onto &lt;a href="http://www.rte.ie/news/2008/0910/hadron_av.html"&gt;national TV&lt;/a&gt;. One interesting side-effect of all the publicity is that the man on the street now knows that Ireland is one of the few European countries that isn't a member of CERN -- maybe it will cause the politicians to reconsider.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7250102906278505005-6841158121082307155?l=gridirelandops.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gridirelandops.blogspot.com/feeds/6841158121082307155/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7250102906278505005&amp;postID=6841158121082307155' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/6841158121082307155'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/6841158121082307155'/><link rel='alternate' type='text/html' href='http://gridirelandops.blogspot.com/2008/09/lhc-switch-on-in-ireland.html' title='LHC switch-on in Ireland'/><author><name>Stephen Childs</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7250102906278505005.post-3313286937303127616</id><published>2008-07-11T10:18:00.003+01:00</published><updated>2008-07-11T10:24:58.061+01:00</updated><title type='text'>geclipse: a nice grid UI at last?</title><content type='html'>I've just been playing around with &lt;a href="http://www.geclipse.eu"&gt;geclipse&lt;/a&gt; and I like what I see. It wraps up the fiddly business of VOMs proxies, information system queries, etc. so you don't have to worry about them. Once I'd downloaded the latest milestone release via eclipse's update manager and set up a VO I was able to submit a job. The WMS was discovered from the information system. They use JSDL to describe jobs, but you fill in the description using dialog boxes -- it can also translate to JDL. There are lots of cool things that I haven't even looked at yet like an interface to amazon ec2 and to local batch systems (to view queues etc.), also visualisation plugins allowing things like interactive jobs.&lt;br /&gt;&lt;br /&gt;This looks like a great interface for grid beginners, especially those who're already familiar with eclipse. I knew that sooner or later someone would get round to writing some good software for submitting grid jobs!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7250102906278505005-3313286937303127616?l=gridirelandops.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gridirelandops.blogspot.com/feeds/3313286937303127616/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7250102906278505005&amp;postID=3313286937303127616' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/3313286937303127616'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/3313286937303127616'/><link rel='alternate' type='text/html' href='http://gridirelandops.blogspot.com/2008/07/geclipse-nice-grid-ui-at-last.html' title='geclipse: a nice grid UI at last?'/><author><name>Stephen Childs</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7250102906278505005.post-6828543659289226424</id><published>2008-04-03T10:21:00.005+01:00</published><updated>2008-04-03T10:44:46.901+01:00</updated><title type='text'>Who is that masked user?</title><content type='html'>Trying to get a better handle on usage of our cluster, I for the first time realised that Maui actually provides quite a nice way of displaying the efficiency of jobs. It doesn't sort them the way you'd like, but then that's what "sort" is for. Here's the "bottom 10" jobs on our system:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;[root@gridgate gridmapdir]# showq -r|sort -n  -k 4|sed -e 's/^[ \t]*//' -e '/^$/d'|head -n 10&lt;br /&gt;JobName  S Par  Effic  XFactor  Q      User    Group    MHost Procs   Remaining            StartTime&lt;br /&gt;447 Jobs     447 of   683 Processors Active (65.45%)&lt;br /&gt;550825_ R DEF   7.53      0.1 DE    fus098   fusion    wn019     1     7:45:00  Fri Mar 28 11:16:42&lt;br /&gt;550438_ R DEF   9.31      0.1 DE    fus098   fusion    wn056     1     1:33:37  Fri Mar 28 05:05:04&lt;br /&gt;550818_ R DEF   9.51      0.0 DE    fus098   fusion    wn072     1     5:19:15  Fri Mar 28 08:50:40&lt;br /&gt;550439_ R DEF   9.65      0.1 DE    fus098   fusion    wn056     1     1:33:37  Fri Mar 28 05:05:04&lt;br /&gt;550429_ R DEF  10.08      0.0 DE    fus098   fusion    wn062     1    00:39:47  Fri Mar 28 04:11:18&lt;br /&gt;550437_ R DEF  10.19      0.1 DE    fus098   fusion    wn056     1     1:33:26  Fri Mar 28 05:05:04&lt;br /&gt;550417  R DEF  10.28      0.1 DE    fus098   fusion    wn011     1    00:27:03  Fri Mar 28 03:58:28&lt;br /&gt;550441_ R DEF  10.30      0.1 DE    fus098   fusion    wn056     1     1:33:40  Fri Mar 28 05:05:04&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Looks like I need to find out who this fus098 guy is. Normally my method for doing this is to grep through &lt;code&gt;/var/log/globus-gatekeeper.log&lt;/code&gt; but I finally got sick of this and wrote a little python script to translate the funny system used in &lt;code&gt;/etc/grid-security/gridmapdir&lt;/code&gt; (documented &lt;a href="http://www.gridsite.org/gridmapdir/"&gt;here&lt;/a&gt;) and output the complete set of pool account mappings. I was going to implement all sorts of fancy options for outputting a particular user's mapping etc. but decided I could do what I needed with grep so I'll leave the fancification to someone else. The script is available &lt;a href="http://grid.ie/distribution/scripts/poolmapping"&gt;here&lt;/a&gt; and here's some sample usage:&lt;br /&gt;&lt;br /&gt;What are the mappings for users with "childs" in their DN?&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;[childss@gridgate childss]$ ./poolmapping |grep -i childs&lt;br /&gt;dte053:/c=ie/o=grid-ireland/ou=cs.tcd.ie/l=ra-tcd/cn=stephen o. childs:dteam&lt;br /&gt;solovo003:/c=ie/o=grid-ireland/ou=cs.tcd.ie/l=ra-tcd/cn=stephen o. childs:solovo&lt;br /&gt;webcom050:/c=ie/o=grid-ireland/ou=cs.tcd.ie/l=ra-tcd/cn=stephen o. childs:webcom&lt;br /&gt;cosmo007:/c=ie/o=grid-ireland/ou=cs.tcd.ie/l=ra-tcd/cn=stephen o. childs&lt;br /&gt;cosmo004:/c=ie/o=grid-ireland/ou=cs.tcd.ie/l=ra-tcd/cn=stephen o. childs:cosmo&lt;br /&gt;gitest042:/c=ie/o=grid-ireland/ou=cs.tcd.ie/l=ra-tcd/cn=stephen o. childs:gitest&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;What DN is mapped to dte053?&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;[childss@gridgate childss]$ ./poolmapping |grep -i dte053&lt;br /&gt;dte053:/c=ie/o=grid-ireland/ou=cs.tcd.ie/l=ra-tcd/cn=stephen o. childs:dteam&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7250102906278505005-6828543659289226424?l=gridirelandops.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gridirelandops.blogspot.com/feeds/6828543659289226424/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7250102906278505005&amp;postID=6828543659289226424' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/6828543659289226424'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/6828543659289226424'/><link rel='alternate' type='text/html' href='http://gridirelandops.blogspot.com/2008/04/who-is-that-masked-user.html' title='Who is that masked user?'/><author><name>Stephen Childs</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7250102906278505005.post-4792180999053262831</id><published>2008-03-11T09:16:00.002Z</published><updated>2008-03-11T09:20:36.008Z</updated><title type='text'>But my proxy hasn't expired!</title><content type='html'>We have been plagued with a frustrating problem (especially in our test environment). Users would generate a new proxy, submit a job immediately and then get an error like this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;[childss@ui childss]$ edg-job-status https://cagraidsvr18.cs.tcd.ie:9000/nbPfABOjQHsG7IcFCJcYLg&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;*************************************************************&lt;br /&gt;BOOKKEEPING INFORMATION:&lt;br /&gt;&lt;br /&gt;Status info for the Job : https://cagraidsvr18.cs.tcd.ie:9000/nbPfABOjQHsG7IcFCJcYLg&lt;br /&gt;Current Status:     Aborted &lt;br /&gt;Status Reason:      Job proxy is expired.&lt;br /&gt;Destination:        gridgate02.testgrid.:2119/jobmanager-lcgpbs-test&lt;br /&gt;reached on:         Tue Mar 11 08:47:31 2008&lt;br /&gt;*************************************************************&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Which is very annoying as the proxy obviously hasn't expired. It turns out that this is due to old jobs stuck on the RB (whose proxies &lt;b&gt;have&lt;/b&gt; expired). The problem can be cleared by logging onto the RB, identifying old jobs for the user's DN and removing them using condor_rm. I'll leave it to someone else to explain why this arises. I hope it's been fixed in the new WMS.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7250102906278505005-4792180999053262831?l=gridirelandops.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gridirelandops.blogspot.com/feeds/4792180999053262831/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7250102906278505005&amp;postID=4792180999053262831' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/4792180999053262831'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/4792180999053262831'/><link rel='alternate' type='text/html' href='http://gridirelandops.blogspot.com/2008/03/but-my-proxy-hasnt-expired.html' title='But my proxy hasn&apos;t expired!'/><author><name>Stephen Childs</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7250102906278505005.post-6502880833440238506</id><published>2008-02-12T08:05:00.000Z</published><updated>2008-02-12T08:34:43.108Z</updated><title type='text'>Can Quattor save the world?</title><content type='html'>Due to the wonders of planet, I've just seen &lt;a href="http://scotgrid.blogspot.com/2008/02/cluster-glue.html"&gt;this&lt;/a&gt; post by Andrew from Glasgow with the intriguing comment: "Are there any better tools? (is Quattor the savoiur for this type of problem)". This post was due to the frustrations of cobbling together fabric management from a collection of very good, but separate tools. So I thought I'd briefly describe some of the advantages of Quattor. I know many were burned in the early days of Quattor by its complexity and obscurity, but times really have changed and I suggest you revisit it. So here are just a few of the reasons I like it:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;b&gt;It's got a &lt;a href="https://trac.lal.in2p3.fr/LCGQWG/wiki/Doc/panc"&gt;real programming language&lt;/a&gt;&lt;/b&gt;: this gives you data structures (e.g. hashes), types (allowing validation of the values you type in -- i.e. it will recognise that "123.34.32.O7" (spot the deliberate mistake) isn't an IP address).&lt;br /&gt;&lt;li&gt;&lt;b&gt;The gLite configuration is up to date with YAIM (and often ahead of it)&lt;/b&gt;: Michel Jouvin has led the way on a number of deployment issues in LCG (e.g. 64-bit WNs, space tokens, etc.) and all this stuff gets into Quattor before YAIM. (Also DNS-style VO-names, Xen configuration, etc., etc.) We have found that whenever we have to do something non-custom (e.g. publishing multiple different jobmanagers from one CE in GIP) it's a doddle in Quattor due to the availability of proper data structures (see above).&lt;br /&gt;&lt;li&gt;&lt;b&gt;The &lt;a href="https://trac.lal.in2p3.fr/LCGQWG/"&gt;Quattor Working Group templates&lt;/a&gt; are effectively a complete Grid distribution in a way that gLite itself isn't&lt;/b&gt;. What I mean is that they provide all you need for going from bare metal to installation of a complete SL-based Grid site. This is ideal for new/small sites.&lt;br /&gt;&lt;li&gt;&lt;b&gt;It's a true community effort&lt;/b&gt;: having been involved in YAIM development for MPI, I have first-hand knowledge of the protracted process involved in getting &lt;i&gt;anything&lt;/i&gt; fixed in gLite. In contrast, Quattor functions as a true OSS project: if there's a problem, you fix it and check it in. If it passes muster after a lightweight review, it's included in the core release. Problem solved.&lt;br /&gt;&lt;li&gt;&lt;b&gt;It provides integration with installation and monitoring&lt;/b&gt;: the contents of configuration profiles for a machine are directly used to generate Kickstart templates, and monitoring (using &lt;a href="http://lemon.web.cern.ch/lemon/index.shtml"&gt;Lemon&lt;/a&gt; is also tightly integrated with a raft of sensors and alerts available.&lt;br /&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7250102906278505005-6502880833440238506?l=gridirelandops.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gridirelandops.blogspot.com/feeds/6502880833440238506/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7250102906278505005&amp;postID=6502880833440238506' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/6502880833440238506'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/6502880833440238506'/><link rel='alternate' type='text/html' href='http://gridirelandops.blogspot.com/2008/02/can-quattor-save-world.html' title='Can Quattor save the world?'/><author><name>Stephen Childs</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7250102906278505005.post-8837031644976148271</id><published>2008-02-04T10:04:00.000Z</published><updated>2008-02-04T10:30:34.184Z</updated><title type='text'>Play it again, SAM</title><content type='html'>After much pain, we have finally got a SAM server up and running for Grid-Ireland (see &lt;a href="https://cagraidsvr16.cs.tcd.ie/sam/sam.py"&gt;here&lt;/a&gt;). We used to run an SFT server, but it was ancient and when eventually the client software became incompatible with the UI distribution, we decided to move to SAM. It looked like there were quite good installation docs &lt;a href="http://sam-docs.web.cern.ch/sam-docs/index.php?dir=./admin/&amp;amp;"&gt;available&lt;/a&gt; so we assigned it to someone as a Friday afternoon project. That was two months ago! It turned out that the documentation, while good, had a few critical errors/omissions in it, and the support was non-existent. We've finally got it sorted now (the last problem was solved when I divined by reading the source code that you had to define an ACL of approved DNs in the config file) and it looks like it should be useful in keeping track of our non-EGEE sites. We'll try and feed our experience back upstream, or (probably more useful) stick it on a public page so it makes it into Google.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7250102906278505005-8837031644976148271?l=gridirelandops.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gridirelandops.blogspot.com/feeds/8837031644976148271/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7250102906278505005&amp;postID=8837031644976148271' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/8837031644976148271'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/8837031644976148271'/><link rel='alternate' type='text/html' href='http://gridirelandops.blogspot.com/2008/02/play-it-again-sam.html' title='Play it again, SAM'/><author><name>Stephen Childs</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7250102906278505005.post-3106489491776699094</id><published>2008-02-01T09:11:00.000Z</published><updated>2008-02-01T09:26:06.467Z</updated><title type='text'>Stepping through the pgrade portal</title><content type='html'>As a Grid veteran, I normally submit jobs using edg-job-*, and at this stage I've almost given up hope that there could be a less painful way of getting jobs onto the Grid. I've tried &lt;a href="http://cern.ch/ganga"&gt;Ganga&lt;/a&gt; in the past, and it was promising, but it didn't work well with the broken MPI on the EGEE grid, so I kind of gave up on it. The latest thing we've installed is the &lt;a href="http://www.lpds.sztaki.hu/pgrade/"&gt;p-grade&lt;/a&gt; portal which has been around for a good while and is allegedly getting "mature" now. The first problem after creating an account was getting my cert set up for use in the portal. I had the cert and key on my local machine, and tried to upload them to a MyProxy server to get something the portal could use. At this point, I was asked for the hostname and port number of the MyProxy service. Now, I actually administer the MyProxy server, and still had to ask a colleague which port it ran on. There is no way in the world a user should have to know this, but apparently you can't set defaults in the portal. We're running version 2.5 still so maybe it's fixed in 2.6.&lt;br /&gt;&lt;br /&gt;Once I got my cert up and running, I went to submit a job. My first job, the challenging "/bin/hostname" test failed. I didn't expect that. Apparently pgrade uploads a binary from your local machine by default rather than executing something hosted on the remote machine. As my local machine is FC6 and the execute node is SL3, the uploaded hostname binary wouldn't run. So if you wanted to run the hostname program on the remote host, you would have to upload a script which ran /bin/hostname.&lt;br /&gt;&lt;br /&gt;The next challenge was how to add input files to the job. It turns out that this is done by adding "ports" to the job node you define. Everything in pgrade is a workflow, so files are ports that allow data to flow between nodes (or from the local machine). It takes a little while to get used to this approach.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7250102906278505005-3106489491776699094?l=gridirelandops.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gridirelandops.blogspot.com/feeds/3106489491776699094/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7250102906278505005&amp;postID=3106489491776699094' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/3106489491776699094'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7250102906278505005/posts/default/3106489491776699094'/><link rel='alternate' type='text/html' href='http://gridirelandops.blogspot.com/2008/02/stepping-through-pgrade-portal.html' title='Stepping through the pgrade portal'/><author><name>Stephen Childs</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
