buggLite Middleware - Bugs: bug #70808, PBS submission script creates...

 
 
Show feedback again

You are not allowed to post comments on this tracker with your current authentication level.

bug #70808: PBS submission script creates wrong stagein/stageout directives with qsub's -W option

Submitted by:  Eygene Ryabinkin <konvpalto>
Submitted on:  2010-07-29 16:15  
 
Status: Ready for ReviewOpen/Closed: Closed
Category: BLAHSeverity: 5 - Major
Baseline Release (where bug has been observed): gLite 3.2Release (where bug fix will be available: EMI 1, EMI 2, EMI 3, All): EMI 2
OS: SL 5Architecture: None
Bug detection area: ProductionAssigned to: None
Privacy: PublicPriority: Medium
Associated Test: None
GGUS reference URL: https://gus.fzk.de/ws/ticket_info.php?ticket=60645
Component tag(s): 

glite-ce-blahp_R_1_14_4_1_GL32

Subsystem tag(s): 

glite-ce_R_1_12_2_1_GL32

Build environment: None

(Jump to the original submission Jump to the original submission)

2012-01-31 16:04, comment #10:

Hi Eygene,
have you had a look at bug #89527?
We had to add two extra syntaxes since version 2.5.8, maybe they work with 2.5.10 too.
The default now is not to repeat the 'stagein=' keyword, but to have the whole -W directive enclosed in escaped quotes, e.g.

-W \'stagein=a@h:b,c@h:d\'

This seems to work with all the versions up to 2.5.8, I hope it works with 2.5.10 too!

David

David Rebatto <rebatto>
Project Member
2012-01-31 12:43, comment #9:

You will be laughthing, but seems like Torque 2.5.10 broke even the '-W stagein=a@h:b,stagein=c@h:d' syntax.

There are patches submitted to the Torque developers,
http://www.clusterresources.com/bug...
that fix both issues and allow EGI people to get rid of the multiple staging workaround.

May I hijack this ticket (since it is already opened and carry the gory details) and ask if these patches can receive some testing apart from CREAM CE at our site?

Thanks.

Eygene Ryabinkin <konvpalto>
2010-10-22 15:53, comment #8:

Fix certified, further details at https://twiki.cnaf.infn.it/twiki/bi...

Paolo Andreetto <pandreet>
Project Member
2010-09-03 12:11, comment #7:

Siince people seem to be -sure- that the multiple stagein/out option worked with all and any relevant version of PBS/Torque,
I committed the suggested patch in the v1.16.x branch of
BLAH.

Francesco Prelz <fprelz>
Project Member
2010-09-03 11:24, comment #6:

In fact, it will be easier just to make a single stagein/stageout directive for each file to avoid regressions on whatever Torque versions. As far as I remember, it was always the case for the lcg-CE.

Eygene Ryabinkin <konvpalto>
2010-09-01 16:27, comment #5:

Does anyone know for sure (or from source) whether the multiple stagein directive works on all previous releases of PBS/Torque ? The oldest version I could test this on is Torque 1.0.1, and it looks OK.
We'd introduce a regression issue in BLAH if this weren't true, however.

As this is a no doubt a Torque bug (they break the very example they give in the manpage for qsub), it's probably better to include an explicit workaround for affected Torque versions. As far as I can tell 'smart_strtok' appeared in 2.4.6. Hoping this
will be fixed before 2.5 appears, what we could do in BLAH is patch pbs_submit.sh to check on pbs_version from
qmgr -c "list server" and use the multiple-stagein format for versions 2.4.x with x>=6

How does this sound ? Would you like us to add this to the next version of BLAH ?

Francesco Prelz <fprelz>
Project Member
2010-07-29 21:50, comment #4:

For the record, the test version of the Torque patch is attached.

(file #15256)

Eygene Ryabinkin <konvpalto>
2010-07-29 21:21, comment #3:

OK, I had created a patch for Torque that will return the needed functionality and will push it upstream.

Thus, you can close this ticket, but may be it will be still worth to apply the patch, because it is a known regression. Though, gLite packages own versions of Torque, so users of a purely "official" distribution won't be affected.

Eygene Ryabinkin <konvpalto>
2010-07-29 20:15, comment #2:

Yes, it is exactly the same issue.

I was under impression that even Torque 2.4.4 had this problem, but I will look into it once again.

Eygene Ryabinkin <konvpalto>
2010-07-29 16:28, comment #1:

On 4 Jun 2010, Andrey Kiryanov wrote the following in the LCG-ROLLOUT mailing list:

Is this the same issue ?

Massimo Sgaravatto <sgaravat>
Project Member
2010-07-29 16:15, original submission:

Currently, /opt/glite/bin/pbs_submit.sh creates a single, comma-separated list of all stagein/stageout file specifications like file1@host:source1,file2@host:source2 and passes it as "-W stagein=<LIST>". Qsub's manual allows this on paper (http://www.clusterresources.com/tor...), but in reality it doesn't allow passing such list and wants directive like "-W stagein=file1@host:source1,stagein=file2@host:source2". In other words, it likes the list of directives for -W and not the list of stagein/stageout objects.

I had verified this behaviour with the sources of Torque's qsub in 2.4.4, 2.4.8 and 2.4.9. One can also verify it by looking into src/cmds/qsub.c, routine process_opts(), look for the string "case 'W':".

Attached is the patch for pbs_submit.sh that modifies its handling of stagein/stageout arguments to implement this behaviour.

My environment is as following:
{{{
# rpm -qa | grep -iE '(cream|blah)'
glite-yaim-cream-ce-4.1.0-9
glite-ce-cream-utils-1.0.0-13.sl5
glite-ce-blahp-1.14.2-3.sl5
glite-CREAM-3.2.6-0.sl5
glite-ce-cream-1.12.2-1
}}}

Eygene Ryabinkin <konvpalto>

 

Attached Files
file #15256:  torque-2.4.9-qsub-fix-W-regression.patch added by konvpalto (2kB - application/octet-stream)
file #15245:  pbs_submit.sh-fix-stagein-stageout.patch added by konvpalto (1kB - application/octet-stream)

 

Depends on the following items: None found

Digest:
   patch dependencies.

 

Carbon-Copy List
  • -unavailable- added by rebatto (Posted a comment)
  • -unavailable- added by pguerrer (Updated the item)
  • -unavailable- added by pandreet (Updated the item)
  • -unavailable- added by fprelz (Posted a comment)
  • -unavailable- added by ype (automatically added by cronjob)
  • -unavailable- added by sburke
  • -unavailable- added by sgaravat (Posted a comment)
  • -unavailable- added by konvpalto (Submitted the item)
  •  

     

     

    Follow 16 latest changes.

    Date Changed By Updated Field Previous Value => Replaced By
    2012-01-31 16:04rebattoRelease (where bug fix will be available: EMI 1, EMI 2, EMI 3, All)=>EMI 2
      Assigned toegeetest=>None
    2010-11-10 15:54pguerrerStatusFix Certified=>Ready for Review
      Open/Closed-Automatic update due to transitions settings-=>Closed
    2010-10-22 15:53pandreetStatusReady for Test=>Fix Certified
    2010-09-24 10:48pandreetComponent tag(s)=>glite-ce-blahp_R_1_14_4_1_GL32
      Subsystem tag(s)=>glite-ce_R_1_12_2_1_GL32
    2010-09-19 22:40sgaravatStatusIntegration Candidate=>Ready for Test
      Assigned to-Automatic update due to transitions settings-=>egeetest
    2010-09-03 12:11fprelzStatusIn progress=>Integration Candidate
    2010-09-03 12:11fprelzStatusNone=>In progress
    2010-07-29 21:50konvpaltoAttached File-=>Added torque-2.4.9-qsub-fix-W-regression.patch, #15256
    2010-07-29 16:41sburkeCarbon-Copy-=>Added sburke
    2010-07-29 16:15konvpaltoAssigned to-Automatic update due to transitions settings-=>fprelz
      Priority-Automatic update due to transitions settings-=>Medium
      Attached File-=>Added pbs_submit.sh-fix-stagein-stageout.patch, #15245
    Show feedback again

    Back to the top


    Powered by Savane SVN (toward 3.1)