2003 December

From angel at miami.edu Mon Dec 1 10:25:34 2003
From: angel at miami.edu (Angel Li)
Date: Mon, 01 Dec 2003 13:25:34 -0500
Subject: [Rocks-Discuss]cluster-fork
Message-ID: <3FCB879E.8050905@miami.edu>

Hi,

I recently installed Rocks 3.0 on a Linux cluster and when I run the
command "cluster-fork" I get this error:

apple* cluster-fork ls
Traceback (innermost last):
File "/opt/rocks/sbin/cluster-fork", line 88, in ?
import rocks.pssh
File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
import gmon.encoder
ImportError: Bad magic number in
/usr/lib/python1.5/site-packages/gmon/encoder.pyc

Any thoughts? I'm also wondering where to find the python sources for
files in /usr/lib/python1.5/site-packages/gmon.

Thanks,

Angel

From jghobrial at uh.edu Mon Dec 1 11:35:06 2003
From: jghobrial at uh.edu (Joseph)
Date: Mon, 1 Dec 2003 13:35:06 -0600 (CST)
In-Reply-To: <3FCB879E.8050905@miami.edu>
References: <3FCB879E.8050905@miami.edu>
Message-ID: <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>

On Mon, 1 Dec 2003, Angel Li wrote:
Hello Angel, I have the same problem and so far there is no response when
I posted about this a month ago.

Is your frontend an AMD setup??

I am thinking this is an AMD problem.

Thanks,
Joseph

> Hi,
>
> I recently installed Rocks 3.0 on a Linux cluster and when I run the
> command "cluster-fork" I get this error:
>
> apple* cluster-fork ls
> Traceback (innermost last):
> File "/opt/rocks/sbin/cluster-fork", line 88, in ?
> import rocks.pssh
> File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?

> import gmon.encoder
> ImportError: Bad magic number in
> /usr/lib/python1.5/site-packages/gmon/encoder.pyc
>
> Any thoughts? I'm also wondering where to find the python sources for
> files in /usr/lib/python1.5/site-packages/gmon.
>
> Thanks,
>
> Angel
>

From tim.carlson at pnl.gov Mon Dec 1 14:58:54 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Mon, 01 Dec 2003 14:58:54 -0800 (PST)
Subject: [Rocks-Discuss]odd kickstart problem
In-Reply-To: <76AC0F5E-2025-11D8-804D-000393A4725A@sdsc.edu>
Message-ID: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov>

Trying to bring up an old dead node on a Rocks 2.3.2 cluster and I get the
following error in /var/log/httpd/error_log

File "/opt/rocks/sbin/kgen", line 530, in ?
app.run()
File "/opt/rocks/sbin/kgen", line 497, in run
doc = FromXmlStream(file)
File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line
386, in FromXmlStream
return reader.fromStream(stream, ownerDocument)
372, in fromStream
self.parser.parse(s)
File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line 58,
in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python1.5/site-packages/xml/sax/xmlreader.py", line 125,
in parse
self.close()
File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line
154, in close
self.feed("", isFinal = 1)
File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line
148, in feed
self._err_handler.fatalError(exc)
340, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: <stdin>:3298:0: no element found

Doing a wget of http://frontend-0/install/kickstart.cgi?
arch=i386&np=2&project=rocks
on one of the working internal nodes yields the same error.

Any thoughts on this?

I've also done a fresh
rocks-dist dist

Tim

From sjenks at uci.edu Mon Dec 1 15:35:54 2003
From: sjenks at uci.edu (Stephen Jenks)
Date: Mon, 1 Dec 2003 15:35:54 -0800
In-Reply-To: <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>
Message-ID: <1B15A45F-2457-11D8-A374-00039389B580@uci.edu>

FYI, I have a dual Athlon frontend and didn't have that problem. I know
that doesn't exactly help you, but at least it doesn't fail on all AMD
machines.

It looks like the .pyc file might be corrupt in your installation. The
source .py file (encoder.py) is in the
/usr/lib/python1.5/site-packages/gmon directory, so perhaps removing
the .pyc file would regenerate it (if you run cluster-fork as root?)

The md5sum for encoder.pyc on my system is:
459c78750fe6e065e9ed464ab23ab73d encoder.pyc
So you can check if yours is different.

Steve Jenks

On Dec 1, 2003, at 11:35 AM, Joseph wrote:

> On Mon, 1 Dec 2003, Angel Li wrote:
> Hello Angel, I have the same problem and so far there is no response
> when
> I posted about this a month ago.
>
> Is your frontend an AMD setup??
>
> I am thinking this is an AMD problem.
>
> Thanks,
> Joseph
>
>
>> Hi,
>>
>> I recently installed Rocks 3.0 on a Linux cluster and when I run the
>> command "cluster-fork" I get this error:
>>
>> apple* cluster-fork ls
>> Traceback (innermost last):
>> File "/opt/rocks/sbin/cluster-fork", line 88, in ?
>> import rocks.pssh
>> File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
>> import gmon.encoder
>> ImportError: Bad magic number in

>> /usr/lib/python1.5/site-packages/gmon/encoder.pyc
>>
>> Any thoughts? I'm also wondering where to find the python sources for
>> files in /usr/lib/python1.5/site-packages/gmon.
>>
>> Thanks,
>>
>> Angel
>>

From mjk at sdsc.edu Mon Dec 1 19:03:16 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Mon, 1 Dec 2003 19:03:16 -0800
In-Reply-To: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov>
References: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov>
Message-ID: <132DD626-2474-11D8-A7A4-000A95DA5638@sdsc.edu>

You'll need to run the kpp and kgen steps (what kickstart.cgi does for
your) manually to find if this is an XML error.

# cd /home/install/profiles/current
# kpp compute

This will generate a kickstart file for a compute nodes, although some
information will be missing since it isn't specific to a node (not like
what ./kickstart.cgi --client=node-name generates). But what this does
do is traverse the XML graph and build a monolithic XML kickstart
profile. If this step works you can then "|" pipe the output into kgen
to convert the XML to kickstart syntax. Something in this procedure
should fail and point to the error.

-mjk

On Dec 1, 2003, at 2:58 PM, Tim Carlson wrote:

> Trying to bring up an old dead node on a Rocks 2.3.2 cluster and I get
> the
> following error in /var/log/httpd/error_log
>
>
> File "/opt/rocks/sbin/kgen", line 530, in ?
> app.run()
> File "/opt/rocks/sbin/kgen", line 497, in run
> doc = FromXmlStream(file)
> File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py",
> line
> 386, in FromXmlStream
> return reader.fromStream(stream, ownerDocument)
> line
> 372, in fromStream
> self.parser.parse(s)
> File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line
> 58,
> in parse

> xmlreader.IncrementalParser.parse(self, source)
> File "/usr/lib/python1.5/site-packages/xml/sax/xmlreader.py", line
> 125,
> in parse
> self.close()
> 154, in close
> self.feed("", isFinal = 1)
> 148, in feed
> self._err_handler.fatalError(exc)
> line
> 340, in fatalError
> raise exception
> xml.sax._exceptions.SAXParseException: <stdin>:3298:0: no element found
>
>
> Doing a wget of
> http://frontend-0/install/kickstart.cgi?
> arch=i386&np=2&project=rocks
> on one of the working internal nodes yields the same error.
>
> Any thoughts on this?
>
> I've also done a fresh
> rocks-dist dist
>
> Tim

Date: Mon, 01 Dec 2003 20:42:51 -0800 (PST)
In-Reply-To: <132DD626-2474-11D8-A7A4-000A95DA5638@sdsc.edu>
Message-ID: <Pine.GSO.4.44.0312012040250.3148-100000@paradox.emsl.pnl.gov>

On Mon, 1 Dec 2003, Mason J. Katz wrote:

> You'll need to run the kpp and kgen steps (what kickstart.cgi does for
> your) manually to find if this is an XML error.
>
> # cd /home/install/profiles/current
> # kpp compute

That was the trick. This sent me down the correct path. I had uninstalled
SGE on the frontend (I was having problems with SGE and wanted to start
from scratch)

Adding the 2 SGE XML files back to /home/install/profiles/2.3.2/nodes/
fixed everything

Thanks!

Tim

From landman at scalableinformatics.com Tue Dec 2 04:15:07 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 02 Dec 2003 07:15:07 -0500
Subject: [Rocks-Discuss]supermicro based MB's
Message-ID: <3FCC824B.5060406@scalableinformatics.com>

Folks:

Working on integrating a Supermicro MB based cluster. Discovered early
on that all of the compute nodes have an Intel based NIC that RedHat
doesn't know anything about (any version of RH). Some of the
administrative nodes have other similar issues. I am seeing simply a
suprising number of mis/un detected hardware across the collection of MBs.

Anyone have advice on where to get modules/module source for Redhat
for these things? It looks like I will need to rebuild the boot CD,
though the several times I have tried this previously have failed to
produce a working/bootable system. It looks like new modules need to be
created/inserted into the boot process (head node and cluster nodes)
kernels, as well as into the installable kernels.

Has anyone done this for a Supermicro MB based system? Thanks .

Joe

--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615

From jghobrial at uh.edu Tue Dec 2 08:28:08 2003
Date: Tue, 2 Dec 2003 10:28:08 -0600 (CST)
In-Reply-To: <1B15A45F-2457-11D8-A374-00039389B580@uci.edu>
<1B15A45F-2457-11D8-A374-00039389B580@uci.edu>

Indeed my md5sum is different for encoder.pyc. However, when I pulled the
file and run "cluster-fork" python responds about an import problem. So it
seems that regeneration did not occur. Is there a flag I need to pass?

I have also tried to figure out what package provides encoder and
reinstall the package, but an rpm query reveals nothing.

If this is a generated file, what generates it?

It seems that an rpm file query on ganglia show that files in the
directory belong to the package, but encoder.pyc does not.

Thanks,

Joseph

On Mon, 1 Dec 2003, Stephen Jenks wrote:
> FYI, I have a dual Athlon frontend and didn't have that problem. I know
> that doesn't exactly help you, but at least it doesn't fail on all AMD
> machines.
>
> It looks like the .pyc file might be corrupt in your installation. The
> source .py file (encoder.py) is in the
> /usr/lib/python1.5/site-packages/gmon directory, so perhaps removing
> the .pyc file would regenerate it (if you run cluster-fork as root?)
>
> The md5sum for encoder.pyc on my system is:
> 459c78750fe6e065e9ed464ab23ab73d encoder.pyc
> So you can check if yours is different.
>
> Steve Jenks
>
>
> On Dec 1, 2003, at 11:35 AM, Joseph wrote:
>
> > On Mon, 1 Dec 2003, Angel Li wrote:
> > Hello Angel, I have the same problem and so far there is no response
> > when
> > I posted about this a month ago.
> >
> > Is your frontend an AMD setup??
> >
> > I am thinking this is an AMD problem.
> >
> > Thanks,
> > Joseph
> >
> >
> >> Hi,
> >>
> >> I recently installed Rocks 3.0 on a Linux cluster and when I run the
> >> command "cluster-fork" I get this error:
> >>
> >> apple* cluster-fork ls
> >> Traceback (innermost last):
> >> File "/opt/rocks/sbin/cluster-fork", line 88, in ?
> >> import rocks.pssh
> >> File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
> >> import gmon.encoder
> >> ImportError: Bad magic number in
> >> /usr/lib/python1.5/site-packages/gmon/encoder.pyc
> >>
> >> Any thoughts? I'm also wondering where to find the python sources for
> >> files in /usr/lib/python1.5/site-packages/gmon.
> >>
> >> Thanks,
> >>
> >> Angel
> >>
>

From angel at miami.edu Tue Dec 2 09:02:55 2003
From: angel at miami.edu (Angel Li)
Date: Tue, 02 Dec 2003 12:02:55 -0500
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-
A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
Message-ID: <3FCCC5BF.3030903@miami.edu>

Joseph wrote:

>Indeed my md5sum is different for encoder.pyc. However, when I pulled the
>file and run "cluster-fork" python responds about an import problem. So it
>seems that regeneration did not occur. Is there a flag I need to pass?
>
>I have also tried to figure out what package provides encoder and
>reinstall the package, but an rpm query reveals nothing.
>
>If this is a generated file, what generates it?
>
>It seems that an rpm file query on ganglia show that files in the
>directory belong to the package, but encoder.pyc does not.
>
>Thanks,
>Joseph
>
>
>
>
I have finally found the python sources in the HPC rolls CD, filename
ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
seems python "compiles" the .py files to ".pyc" and then deletes the
source file the first time they are referenced? I also noticed that
there are two versions of python installed. Maybe the pyc files from one
version won't load into the other one?

Angel

From mjk at sdsc.edu Tue Dec 2 15:52:52 2003
Date: Tue, 2 Dec 2003 15:52:52 -0800
In-Reply-To: <3FCCC5BF.3030903@miami.edu>
<3FCCC5BF.3030903@miami.edu>
Message-ID: <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>

Python creates the .pyc files for you, and does not remove the original
.py file. I would be extremely surprised it two "identical" .pyc files
had the same md5 checksum. I'd expect this to be more like C .o file
which always contain random data to pad out to the end of a page and

32/64 bit word sizes. Still this is just a guess, the real point is
you can always remove the .pyc files and the .py will regenerate it
when imported (although standard UNIX file/dir permission still apply).

What is the import error you get from cluster-fork?

-mjk

On Dec 2, 2003, at 9:02 AM, Angel Li wrote:

> Joseph wrote:
>
>> Indeed my md5sum is different for encoder.pyc. However, when I pulled
>> the file and run "cluster-fork" python responds about an import
>> problem. So it seems that regeneration did not occur. Is there a flag
>> I need to pass?
>>
>> I have also tried to figure out what package provides encoder and
>> reinstall the package, but an rpm query reveals nothing.
>>
>> If this is a generated file, what generates it?
>>
>> It seems that an rpm file query on ganglia show that files in the
>> directory belong to the package, but encoder.pyc does not.
>>
>> Thanks,
>> Joseph
>>
>>
>>
> I have finally found the python sources in the HPC rolls CD, filename
> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
> seems python "compiles" the .py files to ".pyc" and then deletes the
> source file the first time they are referenced? I also noticed that
> there are two versions of python installed. Maybe the pyc files from
> one version won't load into the other one?
>
> Angel
>
>

From vrowley at ucsd.edu Mon Dec 1 14:27:03 2003
From: vrowley at ucsd.edu (V. Rowley)
Date: Mon, 01 Dec 2003 14:27:03 -0800
Subject: [Rocks-Discuss]PXE boot problems
Message-ID: <3FCBC037.5000302@ucsd.edu>

We have installed a ROCKS 3.0.0 frontend on a DL380 and are trying to
install a compute node via PXE. We are getting an error similar to the
one mentioned in the archives, e.g.

> Loading initrd.img....
> Ready
>
> Failed to free base memory
>

We have upgraded to syslinux-2.07-1, per the suggestion in the archives,
but continue to get the same error. Any ideas?

--
Vicky Rowley email: vrowley at ucsd.edu
Biomedical Informatics Research Network work: (858) 536-5980
University of California, San Diego fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715

See pictures from our trip to China at http://www.sagacitech.com/Chinaweb

From naihh at imcb.a-star.edu.sg Tue Dec 2 18:50:55 2003
From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis)
Date: Wed, 3 Dec 2003 10:50:55 +0800
Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRocks 3 for
Itanium?
Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg>

Hi Laurence,

I just downloaded the Rocks3.0 for IA32 and installed it but SGE is
still not working.

Any idea?

Nai Hong Hwa Francis
Institute of Molecular and Cell Biology (A*STAR)
30 Medical Drive
Singapore 117609.
DID: (65) 6874-6196

-----Original Message-----
From: Laurence Liew [mailto:laurence at scalablesys.com]
Sent: Thursday, November 20, 2003 2:53 PM
To: Nai Hong Hwa Francis
Cc: npaci-rocks-discussion at sdsc.edu
Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included
inRocks 3 for Itanium?

Hi Francis

GridEngine roll is ready for ia32. We will get a ia64 native version
ready as soon as we get back from SC2003. It will be released in a few
weeks time.

Globus GT2.4 is included in the Grid Roll

Cheers!
Laurence

On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote:
>
> Hi,

>
> Does anyone have any idea when will Sun Grid Engine be included as
part
> of Rocks 3 distribution.
>
> I am a newbie to Grid Computing.
> Anyone have any idea on how to invoke Globus in Rocks to setup a Grid?
>
> Regards
>
> Nai Hong Hwa Francis
>
> Institute of Molecular and Cell Biology (A*STAR)
> 30 Medical Drive
> Singapore 117609
> DID: 65-6874-6196
>
> -----Original Message-----
> From: npaci-rocks-discussion-request at sdsc.edu
> [mailto:npaci-rocks-discussion-request at sdsc.edu]
> Sent: Thursday, November 20, 2003 4:01 AM
> To: npaci-rocks-discussion at sdsc.edu
> Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs
>
> Send npaci-rocks-discussion mailing list submissions to
> npaci-rocks-discussion at sdsc.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
> or, via email, send a message with subject or body 'help' to
> npaci-rocks-discussion-request at sdsc.edu
>
> You can reach the person managing the list at
> npaci-rocks-discussion-admin at sdsc.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of npaci-rocks-discussion digest..."
>
>
> Today's Topics:
>
> 1. top500 cluster installation movie (Greg Bruno)
> 2. Re: Running Normal Application on Rocks Cluster -
> Newbie Question (Laurence Liew)
>
> --__--__--
>
> Message: 1
> From: Greg Bruno <bruno at rocksclusters.org>
> Date: Tue, 18 Nov 2003 13:41:15 -0800
> Subject: [Rocks-Discuss]top500 cluster installation movie
>
> here's a crew of 7, installing the 201st fastest supercomputer in the
> world in under two hours on the showroom floor at SC 03:
>
> http://www.rocksclusters.org/rocks.mov
>

> warning: the above file is ~65MB.
>
> - gb
>
>
> --__--__--
>
> Message: 2
> Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks
Cluster
> -
> Newbie Question
> From: Laurence Liew <laurenceliew at yahoo.com.sg>
> To: Leong Chee Shian <chee-shian.leong at schenker.com>
> Cc: npaci-rocks-discussion at sdsc.edu
> Date: Wed, 19 Nov 2003 12:31:18 +0800
>
> Chee Shian,
>
> Thanks for your call. We will take this off list and visit you next
week
> in your office as you requested.
>
> Cheers!
> laurence
>
>
>
> On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote:
> > I have just installed Rocks 3.0 with one frontend and two compute
> > node.
> >
> > A normal file based application is installed on the frontend and is
> > NFS shared to the compute nodes .
> >
> > Question is : When run 5 sessions of my applications , the CPU
> > utilization is all concentrated on the frontend node , nothing is
> > being passed on to the compute nodes . How do I make these 3
computers
> > to function as one and share the load ?
> >
> > Thanks everyone as I am really new to this clustering stuff..
> >
> > PS : The idea of exploring rocks cluster is to use a few inexpensive
> > intel machines to replace our existing multi CPU sun server,
> > suggestions and recommendations are greatly appreciated.
> >
> >
> > Leong
> >
> >
> >
>
>
>
> --__--__--
>
> _______________________________________________
> npaci-rocks-discussion mailing list

> npaci-rocks-discussion at sdsc.edu
> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>
>
> End of npaci-rocks-discussion Digest
>
>
> DISCLAIMER:
> This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its contents to any
other person as it may be an offence under the Official Secrets Act.
Thank you.
--
Laurence Liew
CTO, Scalable Systems Pte Ltd
7 Bedok South Road
Singapore 469272
Tel : 65 6827 3953
Fax : 65 6827 3922
Mobile: 65 9029 4312
Email : laurence at scalablesys.com
http://www.scalablesys.com

DISCLAIMER:
This email is confidential and may be privileged. If you are not the intended
recipient, please delete it and notify us immediately. Please do not copy or use it
for any purpose, or disclose its contents to any other person as it may be an
offence under the Official Secrets Act. Thank you.

From laurence at scalablesys.com Tue Dec 2 19:10:08 2003
From: laurence at scalablesys.com (Laurence Liew)
Date: Wed, 03 Dec 2003 11:10:08 +0800
Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included
In-Reply-To: <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg>
References:
<5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg>
Message-ID: <1070421007.2452.51.camel@scalable>

Hi,

SGE is in the SGE roll.

You need to download the base, hpc and sge roll.

The install is now different from V2.3.x

Cheers!
laurence

On Wed, 2003-12-03 at 10:50, Nai Hong Hwa Francis wrote:
> Hi Laurence,
>

> I just downloaded the Rocks3.0 for IA32 and installed it but SGE is
> still not working.
>
> Any idea?
>
> 30 Medical Drive
> Singapore 117609.
> DID: (65) 6874-6196
>
> From: Laurence Liew [mailto:laurence at scalablesys.com]
> Sent: Thursday, November 20, 2003 2:53 PM
> To: Nai Hong Hwa Francis
> Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included
> inRocks 3 for Itanium?
>
> Hi Francis
>
> GridEngine roll is ready for ia32. We will get a ia64 native version
> ready as soon as we get back from SC2003. It will be released in a few
> weeks time.
>
> Globus GT2.4 is included in the Grid Roll
>
> Cheers!
> Laurence
>
>
> On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote:
> >
> > Hi,
> >
> > Does anyone have any idea when will Sun Grid Engine be included as
> part
> > of Rocks 3 distribution.
> >
> > I am a newbie to Grid Computing.
> > Anyone have any idea on how to invoke Globus in Rocks to setup a Grid?
> >
> > Regards
> >
> > Nai Hong Hwa Francis
> >
> > Institute of Molecular and Cell Biology (A*STAR)
> > 30 Medical Drive
> > Singapore 117609
> > DID: 65-6874-6196
> >
> > -----Original Message-----
> > From: npaci-rocks-discussion-request at sdsc.edu
> > [mailto:npaci-rocks-discussion-request at sdsc.edu]
> > Sent: Thursday, November 20, 2003 4:01 AM
> > To: npaci-rocks-discussion at sdsc.edu
> > Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs
> >
> > Send npaci-rocks-discussion mailing list submissions to

> > npaci-rocks-discussion at sdsc.edu
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >
> > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
> > or, via email, send a message with subject or body 'help' to
> > npaci-rocks-discussion-request at sdsc.edu
> >
> > You can reach the person managing the list at
> > npaci-rocks-discussion-admin at sdsc.edu
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of npaci-rocks-discussion digest..."
> >
> >
> > Today's Topics:
> >
> > 1. top500 cluster installation movie (Greg Bruno)
> > 2. Re: Running Normal Application on Rocks Cluster -
> > Newbie Question (Laurence Liew)
> >
> > --__--__--
> >
> > Message: 1
> > From: Greg Bruno <bruno at rocksclusters.org>
> > Date: Tue, 18 Nov 2003 13:41:15 -0800
> > Subject: [Rocks-Discuss]top500 cluster installation movie
> >
> > here's a crew of 7, installing the 201st fastest supercomputer in the
> > world in under two hours on the showroom floor at SC 03:
> >
> > http://www.rocksclusters.org/rocks.mov
> >
> > warning: the above file is ~65MB.
> >
> > - gb
> >
> >
> > --__--__--
> >
> > Message: 2
> > Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks
> Cluster
> > -
> > Newbie Question
> > From: Laurence Liew <laurenceliew at yahoo.com.sg>
> > To: Leong Chee Shian <chee-shian.leong at schenker.com>
> > Cc: npaci-rocks-discussion at sdsc.edu
> > Date: Wed, 19 Nov 2003 12:31:18 +0800
> >
> > Chee Shian,
> >
> > Thanks for your call. We will take this off list and visit you next
> week
> > in your office as you requested.
> >
> > Cheers!
> > laurence

> >
> >
> >
> > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote:
> > > I have just installed Rocks 3.0 with one frontend and two compute
> > > node.
> > >
> > > A normal file based application is installed on the frontend and is
> > > NFS shared to the compute nodes .
> > >
> > > Question is : When run 5 sessions of my applications , the CPU
> > > utilization is all concentrated on the frontend node , nothing is
> > > being passed on to the compute nodes . How do I make these 3
> computers
> > > to function as one and share the load ?
> > >
> > > Thanks everyone as I am really new to this clustering stuff..
> > >
> > > PS : The idea of exploring rocks cluster is to use a few inexpensive
> > > intel machines to replace our existing multi CPU sun server,
> > > suggestions and recommendations are greatly appreciated.
> > >
> > >
> > > Leong
> > >
> > >
> > >
> >
> >
> >
> > --__--__--
> >
> > _______________________________________________
> > npaci-rocks-discussion mailing list
> >
> >
> > End of npaci-rocks-discussion Digest
> >
> >
> > DISCLAIMER:
> > This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its contents to any
> other person as it may be an offence under the Official Secrets Act.
> Thank you.
--
Laurence Liew
7 Bedok South Road
Singapore 469272
Tel : 65 6827 3953
Fax : 65 6827 3922
Mobile: 65 9029 4312

From DGURGUL at PARTNERS.ORG Wed Dec 3 07:24:29 2003
From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.)
Date: Wed, 3 Dec 2003 10:24:29 -0500
Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRo
cks 3 for Itanium?
Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu>

Where do we find the SGE roll? Under Lhoste at http://rocks.npaci.edu/Rocks/
there is a "Grid" roll listed. Is SGE in that? The userguide doesn't mention
SGE.

Dennis J. Gurgul
Partners Health Care System
Research Management
Research Computing Core
617.724.3169

From: npaci-rocks-discussion-admin at sdsc.edu
[mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Laurence Liew
Sent: Tuesday, December 02, 2003 10:10 PM
To: Nai Hong Hwa Francis
Subject: RE: [Rocks-Discuss]RE: When will Sun Grid Engine be included

Hi,

SGE is in the SGE roll.

You need to download the base, hpc and sge roll.

The install is now different from V2.3.x

Cheers!
laurence

On Wed, 2003-12-03 at 10:50, Nai Hong Hwa Francis wrote:
> Hi Laurence,
>
> I just downloaded the Rocks3.0 for IA32 and installed it but SGE is
> still not working.
>
> Any idea?
>
> 30 Medical Drive
> Singapore 117609.
> DID: (65) 6874-6196
>
> From: Laurence Liew [mailto:laurence at scalablesys.com]
> Sent: Thursday, November 20, 2003 2:53 PM

> To: Nai Hong Hwa Francis
> Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included
> inRocks 3 for Itanium?
>
> Hi Francis
>
> GridEngine roll is ready for ia32. We will get a ia64 native version
> ready as soon as we get back from SC2003. It will be released in a few
> weeks time.
>
> Globus GT2.4 is included in the Grid Roll
>
> Cheers!
> Laurence
>
>
> On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote:
> >
> > Hi,
> >
> > Does anyone have any idea when will Sun Grid Engine be included as
> part
> > of Rocks 3 distribution.
> >
> > I am a newbie to Grid Computing.
> > Anyone have any idea on how to invoke Globus in Rocks to setup a Grid?
> >
> > Regards
> >
> > Nai Hong Hwa Francis
> >
> > Institute of Molecular and Cell Biology (A*STAR)
> > 30 Medical Drive
> > Singapore 117609
> > DID: 65-6874-6196
> >
> > -----Original Message-----
> > From: npaci-rocks-discussion-request at sdsc.edu
> > [mailto:npaci-rocks-discussion-request at sdsc.edu]
> > Sent: Thursday, November 20, 2003 4:01 AM
> > Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs
> >
> > Send npaci-rocks-discussion mailing list submissions to
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >
> > or, via email, send a message with subject or body 'help' to
> > npaci-rocks-discussion-request at sdsc.edu
> >
> > You can reach the person managing the list at
> > npaci-rocks-discussion-admin at sdsc.edu
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of npaci-rocks-discussion digest..."
> >

> >
> > Today's Topics:
> >
> > 1. top500 cluster installation movie (Greg Bruno)
> > 2. Re: Running Normal Application on Rocks Cluster -
> > Newbie Question (Laurence Liew)
> >
> > --__--__--
> >
> > Message: 1
> > From: Greg Bruno <bruno at rocksclusters.org>
> > Date: Tue, 18 Nov 2003 13:41:15 -0800
> > Subject: [Rocks-Discuss]top500 cluster installation movie
> >
> > here's a crew of 7, installing the 201st fastest supercomputer in the
> > world in under two hours on the showroom floor at SC 03:
> >
> > http://www.rocksclusters.org/rocks.mov
> >
> > warning: the above file is ~65MB.
> >
> > - gb
> >
> >
> > --__--__--
> >
> > Message: 2
> > Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks
> Cluster
> > -
> > Newbie Question
> > From: Laurence Liew <laurenceliew at yahoo.com.sg>
> > To: Leong Chee Shian <chee-shian.leong at schenker.com>
> > Cc: npaci-rocks-discussion at sdsc.edu
> > Date: Wed, 19 Nov 2003 12:31:18 +0800
> >
> > Chee Shian,
> >
> > Thanks for your call. We will take this off list and visit you next
> week
> > in your office as you requested.
> >
> > Cheers!
> > laurence
> >
> >
> >
> > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote:
> > > I have just installed Rocks 3.0 with one frontend and two compute
> > > node.
> > >
> > > A normal file based application is installed on the frontend and is
> > > NFS shared to the compute nodes .
> > >
> > > Question is : When run 5 sessions of my applications , the CPU
> > > utilization is all concentrated on the frontend node , nothing is
> > > being passed on to the compute nodes . How do I make these 3
> computers

> > > to function as one and share the load ?
> > >
> > > Thanks everyone as I am really new to this clustering stuff..
> > >
> > > PS : The idea of exploring rocks cluster is to use a few inexpensive
> > > intel machines to replace our existing multi CPU sun server,
> > > suggestions and recommendations are greatly appreciated.
> > >
> > >
> > > Leong
> > >
> > >
> > >
> >
> >
> >
> > --__--__--
> >
> > _______________________________________________
> > npaci-rocks-discussion mailing list
> >
> >
> > End of npaci-rocks-discussion Digest
> >
> >
> > DISCLAIMER:
> > This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its contents to any
> other person as it may be an offence under the Official Secrets Act.
> Thank you.
--
Laurence Liew
7 Bedok South Road
Singapore 469272
Tel : 65 6827 3953
Fax : 65 6827 3922
Mobile: 65 9029 4312

From bruno at rocksclusters.org Wed Dec 3 07:32:14 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Wed, 3 Dec 2003 07:32:14 -0800
Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRo cks 3 for
Itanium?
In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu>
References: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu>
Message-ID: <DF132702-25A5-11D8-86E6-000A95C4E3B4@rocksclusters.org>

> Where do we find the SGE roll? Under Lhoste at
> http://rocks.npaci.edu/Rocks/
> there is a "Grid" roll listed. Is SGE in that? The userguide doesn't
> mention
> SGE.

the SGE roll will be available in the upcoming v3.1.0 release.
scheduled release date is december 15th.

- gb

From jlkaiser at fnal.gov Wed Dec 3 08:35:18 2003
From: jlkaiser at fnal.gov (Joe Kaiser)
Date: Wed, 03 Dec 2003 10:35:18 -0600
Subject: [Rocks-Discuss]supermicro based MB's
In-Reply-To: <3FCC824B.5060406@scalableinformatics.com>
References: <3FCC824B.5060406@scalableinformatics.com>
Message-ID: <1070469318.12324.13.camel@nietzsche.fnal.gov>

Hi,

You don't say what version of Rocks you are using. The following is for
the X5DPA-GG board and Rocks 3.0. It requires modifying only the
pcitable in the boot image on the tftp server. I believe the procedure
for 2.3.2 requires a heck of a lot more work, (but it may not). I would
have to dig deep for the notes about the changing 2.3.2.

This should be done on the frontend:

cd /tftpboot/X86PC/UNDI/pxelinux/
cp initrd.img initrd.img.orig
cp initrd.img /tmp
cd /tmp
mv initrd.img initrd.gz
gunzip initrd.gz
mkdir /mnt/loop
mount -o loop initrd /mnt/loop
cd /mnt/loop/modules/
vi pcitable

Search for the e1000 drivers and add the following line:

0x8086 0x1013 "e1000" "Intel Corp.|82546EB Gigabit Ethernet
Controller"

write the file

cd /tmp
umount /mnt/loop
gzip initrd
mv initrd.gz initrd.img
mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/

Then boot the node.

Hope this helps.

Thanks,

Joe

On Tue, 2003-12-02 at 06:15, Joe Landman wrote:

> Folks:
>
> Working on integrating a Supermicro MB based cluster. Discovered early
> on that all of the compute nodes have an Intel based NIC that RedHat
> doesn't know anything about (any version of RH). Some of the
> administrative nodes have other similar issues. I am seeing simply a
> suprising number of mis/un detected hardware across the collection of MBs.
>
> Anyone have advice on where to get modules/module source for Redhat
> for these things? It looks like I will need to rebuild the boot CD,
> though the several times I have tried this previously have failed to
> produce a working/bootable system. It looks like new modules need to be
> created/inserted into the boot process (head node and cluster nodes)
> kernels, as well as into the installable kernels.
>
> Has anyone done this for a Supermicro MB based system? Thanks .
>
> Joe
--
===================================================================
Joe Kaiser - Systems Administrator

Fermi Lab
CD/OSS-SCS Never laugh at live dragons.
630-840-6444
jlkaiser at fnal.gov
===================================================================

From jghobrial at uh.edu Wed Dec 3 08:59:15 2003
Date: Wed, 3 Dec 2003 10:59:15 -0600 (CST)
In-Reply-To: <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>
<1B15A45F-2457-11D8-A374-00039389B580@uci.edu>
<3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>

Here is the error I receive when I remove the file encoder.pyc and run the
command cluster-fork

File "/opt/rocks/sbin/cluster-fork", line 88, in ?
import rocks.pssh
File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
import gmon.encoder
ImportError: No module named encoder

Thanks,
Joseph

On Tue, 2 Dec 2003, Mason J. Katz wrote:

> Python creates the .pyc files for you, and does not remove the original

> .py file. I would be extremely surprised it two "identical" .pyc files
> had the same md5 checksum. I'd expect this to be more like C .o file
> which always contain random data to pad out to the end of a page and
> 32/64 bit word sizes. Still this is just a guess, the real point is
> you can always remove the .pyc files and the .py will regenerate it
> when imported (although standard UNIX file/dir permission still apply).
>
> What is the import error you get from cluster-fork?
>
> -mjk
>
> On Dec 2, 2003, at 9:02 AM, Angel Li wrote:
>
> > Joseph wrote:
> >
> >> Indeed my md5sum is different for encoder.pyc. However, when I pulled
> >> the file and run "cluster-fork" python responds about an import
> >> problem. So it seems that regeneration did not occur. Is there a flag
> >> I need to pass?
> >>
> >> I have also tried to figure out what package provides encoder and
> >> reinstall the package, but an rpm query reveals nothing.
> >>
> >> If this is a generated file, what generates it?
> >>
> >> It seems that an rpm file query on ganglia show that files in the
> >> directory belong to the package, but encoder.pyc does not.
> >>
> >> Thanks,
> >> Joseph
> >>
> >>
> >>
> > I have finally found the python sources in the HPC rolls CD, filename
> > ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
> > seems python "compiles" the .py files to ".pyc" and then deletes the
> > source file the first time they are referenced? I also noticed that
> > there are two versions of python installed. Maybe the pyc files from
> > one version won't load into the other one?
> >
> > Angel
> >
> >
>

From mjk at sdsc.edu Wed Dec 3 15:19:38 2003
Date: Wed, 3 Dec 2003 15:19:38 -0800
Message-ID: <2A332131-25E7-11D8-A641-000A95DA5638@sdsc.edu>

This file come from a ganglia package, what does

# rpm -q ganglia-receptor

Return?

-mjk


> Here is the error I receive when I remove the file encoder.pyc and run
> the
> command cluster-fork
>
> import rocks.pssh
> ImportError: No module named encoder
>
> Thanks,
> Joseph
>
>
> On Tue, 2 Dec 2003, Mason J. Katz wrote:
>
>> Python creates the .pyc files for you, and does not remove the
>> original
>> .py file. I would be extremely surprised it two "identical" .pyc
>> files
>> had the same md5 checksum. I'd expect this to be more like C .o file
>> which always contain random data to pad out to the end of a page and
>> 32/64 bit word sizes. Still this is just a guess, the real point is
>> you can always remove the .pyc files and the .py will regenerate it
>> when imported (although standard UNIX file/dir permission still
>> apply).
>>
>> What is the import error you get from cluster-fork?
>>
>> -mjk
>>
>> On Dec 2, 2003, at 9:02 AM, Angel Li wrote:
>>
>>> Joseph wrote:
>>>
>>>> Indeed my md5sum is different for encoder.pyc. However, when I
>>>> pulled
>>>> the file and run "cluster-fork" python responds about an import
>>>> problem. So it seems that regeneration did not occur. Is there a
>>>> flag
>>>> I need to pass?
>>>>
>>>> I have also tried to figure out what package provides encoder and
>>>> reinstall the package, but an rpm query reveals nothing.
>>>>
>>>> If this is a generated file, what generates it?
>>>>
>>>> It seems that an rpm file query on ganglia show that files in the

>>>> directory belong to the package, but encoder.pyc does not.
>>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>>
>>>>
>>> I have finally found the python sources in the HPC rolls CD, filename
>>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
>>> seems python "compiles" the .py files to ".pyc" and then deletes the
>>> source file the first time they are referenced? I also noticed that
>>> there are two versions of python installed. Maybe the pyc files from
>>> one version won't load into the other one?
>>>
>>> Angel
>>>
>>>
>>

From csamuel at vpac.org Wed Dec 3 18:09:26 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 4 Dec 2003 13:09:26 +1100
Subject: [Rocks-Discuss]Confirmation of Rocks 3.1.0 Opteron support & RHEL
trademark removal ?
Message-ID: <200312041309.27986.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi folks,

Can someone confirm that the next Rocks release will support Opteron please ?

Also, I noticed that the current Rocks release on Itanium based on RHEL still
has a lot of mentions of RedHat in it, which from my reading of their
trademark guidelines is not permitted, is that fixed in the new version ?

cheers!
Chris
- --
Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
Victorian Partnership for Advanced Computing http://www.vpac.org/
Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/zpdWO2KABBYQAh8RAqB8AJ9FG+IjIeem21qlFS6XYIHamIMPmwCghVTV
AgjAlVHWgdv/KzYQinHGPxs=
=IAWU
-----END PGP SIGNATURE-----

From bruno at rocksclusters.org Wed Dec 3 18:46:30 2003
Date: Wed, 3 Dec 2003 18:46:30 -0800

Subject: [Rocks-Discuss]Confirmation of Rocks 3.1.0 Opteron support & RHEL
trademark removal ?
In-Reply-To: <200312041309.27986.csamuel@vpac.org>
References: <200312041309.27986.csamuel@vpac.org>
Message-ID: <10AD9827-2604-11D8-86E6-000A95C4E3B4@rocksclusters.org>

> Can someone confirm that the next Rocks release will support Opteron
> please ?

yes, it will support opteron.

> Also, I noticed that the current Rocks release on Itanium based on
> RHEL still
> has a lot of mentions of RedHat in it, which from my reading of their
> trademark guidelines is not permitted, is that fixed in the new
> version ?

and yes, (even though it doesn't feel like the right thing to do, as
redhat has offered to the community some outstanding technologies that
we'd like to credit), all redhat trademarks will be removed from 3.1.0.

- gb

From fds at sdsc.edu Thu Dec 4 06:46:32 2003
From: fds at sdsc.edu (Federico Sacerdoti)
Date: Thu, 4 Dec 2003 06:46:32 -0800
Message-ID: <A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu>

Please install the
http://www.rocksclusters.org/errata/3.0.0/ganglia-python-3.0.1
-2.i386.rpm package, which includes the correct encoder.py file. (This
package is listed on the 3.0.0 errata page)

-Federico


> Here is the error I receive when I remove the file encoder.pyc and run
> the
> command cluster-fork
>
> import rocks.pssh
> ImportError: No module named encoder
>
> Thanks,
> Joseph

>
>
> On Tue, 2 Dec 2003, Mason J. Katz wrote:
>
>> Python creates the .pyc files for you, and does not remove the
>> original
>> .py file. I would be extremely surprised it two "identical" .pyc
>> files
>> had the same md5 checksum. I'd expect this to be more like C .o file
>> which always contain random data to pad out to the end of a page and
>> 32/64 bit word sizes. Still this is just a guess, the real point is
>> you can always remove the .pyc files and the .py will regenerate it
>> when imported (although standard UNIX file/dir permission still
>> apply).
>>
>> What is the import error you get from cluster-fork?
>>
>> -mjk
>>
>> On Dec 2, 2003, at 9:02 AM, Angel Li wrote:
>>
>>> Joseph wrote:
>>>
>>>> Indeed my md5sum is different for encoder.pyc. However, when I
>>>> pulled
>>>> the file and run "cluster-fork" python responds about an import
>>>> problem. So it seems that regeneration did not occur. Is there a
>>>> flag
>>>> I need to pass?
>>>>
>>>> I have also tried to figure out what package provides encoder and
>>>> reinstall the package, but an rpm query reveals nothing.
>>>>
>>>> If this is a generated file, what generates it?
>>>>
>>>> It seems that an rpm file query on ganglia show that files in the
>>>> directory belong to the package, but encoder.pyc does not.
>>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>>
>>>>
>>> I have finally found the python sources in the HPC rolls CD, filename
>>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
>>> seems python "compiles" the .py files to ".pyc" and then deletes the
>>> source file the first time they are referenced? I also noticed that
>>> there are two versions of python installed. Maybe the pyc files from
>>> one version won't load into the other one?
>>>
>>> Angel
>>>
>>>
>>
>>
Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA

From jghobrial at uh.edu Thu Dec 4 07:14:21 2003
Date: Thu, 4 Dec 2003 09:14:21 -0600 (CST)
In-Reply-To: <A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu>
<1B15A45F-2457-11D8-A374-00039389B580@uci.edu>
<A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu>

Thank you very much this solved the problem.

Joseph

On Thu, 4 Dec 2003, Federico Sacerdoti wrote:

> Please install the
> http://www.rocksclusters.org/errata/3.0.0/ganglia-python-3.0.1
> -2.i386.rpm package, which includes the correct encoder.py file. (This
> package is listed on the 3.0.0 errata page)
>
> -Federico
>
> On Dec 3, 2003, at 8:59 AM, Joseph wrote:
>
> > Here is the error I receive when I remove the file encoder.pyc and run
> > the
> > command cluster-fork
> >
> > Traceback (innermost last):
> > File "/opt/rocks/sbin/cluster-fork", line 88, in ?
> > import rocks.pssh
> > File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
> > import gmon.encoder
> > ImportError: No module named encoder
> >
> > Thanks,
> > Joseph
> >
> >
> > On Tue, 2 Dec 2003, Mason J. Katz wrote:
> >
> >> Python creates the .pyc files for you, and does not remove the
> >> original
> >> .py file. I would be extremely surprised it two "identical" .pyc
> >> files
> >> had the same md5 checksum. I'd expect this to be more like C .o file
> >> which always contain random data to pad out to the end of a page and
> >> 32/64 bit word sizes. Still this is just a guess, the real point is
> >> you can always remove the .pyc files and the .py will regenerate it
> >> when imported (although standard UNIX file/dir permission still
> >> apply).

> >>
> >> What is the import error you get from cluster-fork?
> >>
> >> -mjk
> >>
> >> On Dec 2, 2003, at 9:02 AM, Angel Li wrote:
> >>
> >>> Joseph wrote:
> >>>
> >>>> Indeed my md5sum is different for encoder.pyc. However, when I
> >>>> pulled
> >>>> the file and run "cluster-fork" python responds about an import
> >>>> problem. So it seems that regeneration did not occur. Is there a
> >>>> flag
> >>>> I need to pass?
> >>>>
> >>>> I have also tried to figure out what package provides encoder and
> >>>> reinstall the package, but an rpm query reveals nothing.
> >>>>
> >>>> If this is a generated file, what generates it?
> >>>>
> >>>> It seems that an rpm file query on ganglia show that files in the
> >>>> directory belong to the package, but encoder.pyc does not.
> >>>>
> >>>> Thanks,
> >>>> Joseph
> >>>>
> >>>>
> >>>>
> >>> I have finally found the python sources in the HPC rolls CD, filename
> >>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
> >>> seems python "compiles" the .py files to ".pyc" and then deletes the
> >>> source file the first time they are referenced? I also noticed that
> >>> there are two versions of python installed. Maybe the pyc files from
> >>> one version won't load into the other one?
> >>>
> >>> Angel
> >>>
> >>>
> >>
> >>
> Federico
>
> Rocks Cluster Group, San Diego Supercomputing Center, CA
>

From vrowley at ucsd.edu Thu Dec 4 12:29:55 2003
From: vrowley at ucsd.edu (V. Rowley)
Date: Thu, 04 Dec 2003 12:29:55 -0800
Subject: [Rocks-Discuss]Re: PXE boot problems
In-Reply-To: <3FCBC037.5000302@ucsd.edu>
References: <3FCBC037.5000302@ucsd.edu>
Message-ID: <3FCF9943.1020806@ucsd.edu>

Uh, nevermind. We had upgraded syslinux on our frontend, not the node
we were trying to PXE boot. Sigh.

V. Rowley wrote:

> We have installed a ROCKS 3.0.0 frontend on a DL380 and are trying to
> install a compute node via PXE. We are getting an error similar to the
> one mentioned in the archives, e.g.
>
>> Loading initrd.img....
>> Ready
>>
>> Failed to free base memory
>>
>
> We have upgraded to syslinux-2.07-1, per the suggestion in the archives,
> but continue to get the same error. Any ideas?
>

--
Vicky Rowley email: vrowley at ucsd.edu
Biomedical Informatics Research Network work: (858) 536-5980
University of California, San Diego fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715

See pictures from our trip to China at http://www.sagacitech.com/Chinaweb

From cdwan at mail.ahc.umn.edu Fri Dec 5 08:16:07 2003
From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))
Date: Fri, 5 Dec 2003 10:16:07 -0600 (CST)
Subject: [Rocks-Discuss]Private NIS master
Message-ID: <Pine.GSO.4.58.0312042305070.18193@lenti.med.umn.edu>

Hello all. Long time listener, first time caller. Thanks for all the
great work.

I'm integrating a Rocks cluster into an existing NIS domain. I noticed
that while the cluster database now supports a PrivateNISMaster, that
variable doesn't make it into the /etc/yp.conf on the compute nodes. They
remain broadcast.

Assume that, for whatever reason, I don't want to set up a repeater
(slave) ypserv process on my frontend. I added the option "--nisserver
<var name="Kickstart_PrivateNISMaster"/>" to the
"profiles/3.0.0/nodes/nis-client.xml" file, removed the ypserver on my
frontend, and it works like I want it to.

Am I missing anything fundamental here?

-Chris Dwan
University of Minnesota

From wyzhong78 at msn.com Mon Dec 8 06:18:34 2003
From: wyzhong78 at msn.com (zhong wenyu)
Date: Mon, 08 Dec 2003 22:18:34 +0800
Subject: [Rocks-Discuss]3.0.0 problem: not able to boot up
Message-ID: <BAY3-F14uFqD45TpNO40002c14c@hotmail.com>

Hi,everyone!

I installed rocks 3.0.0 defautly, There wasn't any trouble in the
installing. But I haven't be able to boot,it stopped at the beginning,the
message "GRUB" showed on the screen,and waiting....
my hardware are double Xeon 2.4G,MSI 9138,Seagate SCSI disk.
Any appreciate is welcome!

_________________________________________________________________
???? MSN Explorer: http://explorer.msn.com/lccn/

From angelini at vki.ac.be Mon Dec 8 06:20:45 2003
From: angelini at vki.ac.be (Angelini Giuseppe)
Date: Mon, 08 Dec 2003 15:20:45 +0100
Subject: [Rocks-Discuss]How to use MPICH with ssh
Message-ID: <3FD488BD.3EBBDB8D@vki.ac.be>

Dear rocks folk,

I have recently installed mpich with Lahay Fortran and now that I can
compile and link,
I would like to run but it seems that I have another problem. In fact I
have the following
error message when I try to run:

[panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE
$DPT/hybflow
p0_13226: p4_error: Path to program is invalid while starting
/dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7:
-1
p4_error: latest msg from perror: No such file or directory
p0_13226: p4_error: Child process exited while making connection to
remote process on compute-0-6: 0
p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32
p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32

I am wondering why it is looking for /usr/bin/rsh for the communication,

I expected to use ssh and not rsh.

Any help will be welcome.

Regards.

Giuseppe Angelini

From casuj at cray.com Mon Dec 8 07:31:21 2003
From: casuj at cray.com (John Casu)
Date: Mon, 8 Dec 2003 07:31:21 -0800
In-Reply-To: <3FD488BD.3EBBDB8D@vki.ac.be>; from Angelini Giuseppe on Mon, Dec 08,
2003 at 03:20:45PM +0100
References: <3FD488BD.3EBBDB8D@vki.ac.be>
Message-ID: <20031208073121.A10151@stemp3.wc.cray.com>

On Mon, Dec 08, 2003 at 03:20:45PM +0100, Angelini Giuseppe wrote:
>
> Dear rocks folk,
>
>
> I have recently installed mpich with Lahay Fortran and now that I can
> compile and link,
> I would like to run but it seems that I have another problem. In fact I
> have the following
> error message when I try to run:
>
> [panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE
> $DPT/hybflow
> p0_13226: p4_error: Path to program is invalid while starting
> /dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7:
> -1
> p4_error: latest msg from perror: No such file or directory
> p0_13226: p4_error: Child process exited while making connection to
> remote process on compute-0-6: 0
> p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32
> p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32
>
> I am wondering why it is looking for /usr/bin/rsh for the communication,
>
> I expected to use ssh and not rsh.
>
> Any help will be welcome.
>

build mpich thus:

RSHCOMMAND=ssh ./configure .....

>
> Regards.
>
>
> Giuseppe Angelini

--
"Roses are red, Violets are blue,
You lookin' at me ?
YOU LOOKIN' AT ME ?!" -- Get Fuzzy.
=======================================================================
John Casu
Cray Inc. casuj at cray.com
411 First Avenue South, Suite 600 Tel: (206) 701-2173
Seattle, WA 98104-2860 Fax: (206) 701-2500
=======================================================================

From davidow at molbio.mgh.harvard.edu Mon Dec 8 08:12:53 2003
From: davidow at molbio.mgh.harvard.edu (Lance Davidow)
Date: Mon, 8 Dec 2003 11:12:53 -0500
In-Reply-To: <3FD488BD.3EBBDB8D@vki.ac.be>

References: <3FD488BD.3EBBDB8D@vki.ac.be>
Message-ID: <p06002001bbfa51fea005@[132.183.190.222]>

Giuseppe,

Here's an answer from a newbie who just faced the same problem.

You are using the wrong flavor of mpich (and mpirun). There are
several different distributions which work differently in ROCKS. the
one you are using in the default path expects serv_p4 demons and
.rhosts files in your home directory. The different flavors may be
more compatible with different compilers as well.

[lance at rescluster2 lance]$ which mpirun
/opt/mpich-mpd/gnu/bin/mpirun

the one you probably want is
/opt/mpich/gnu/bin/mpirun

[lance at rescluster2 lance]$ locate mpirun
...
/opt/mpich-mpd/gnu/bin/mpirun
...
/opt/mpich/myrinet/gnu/bin/mpirun
...
/opt/mpich/gnu/bin/mpirun

Cheers,
Lance

At 3:20 PM +0100 12/8/03, Angelini Giuseppe wrote:
>Dear rocks folk,
>
>
>I have recently installed mpich with Lahay Fortran and now that I can
>compile and link,
>I would like to run but it seems that I have another problem. In fact I
>have the following
>error message when I try to run:
>
>[panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE
>$DPT/hybflow
>p0_13226: p4_error: Path to program is invalid while starting
>/dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7:
>-1
> p4_error: latest msg from perror: No such file or directory
>p0_13226: p4_error: Child process exited while making connection to
>remote process on compute-0-6: 0
>p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32
>p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32
>
>I am wondering why it is looking for /usr/bin/rsh for the communication,
>
>I expected to use ssh and not rsh.
>
>Any help will be welcome.
>
>

>Regards.
>
>Giuseppe Angelini

--
Lance Davidow, PhD
Director of Bioinformatics
Dept of Molecular Biology
Mass General Hospital
Boston MA 02114
davidow at molbio.mgh.harvard.edu
617.726-5955
Fax: 617.726-6893

From rscarce at caci.com Fri Dec 5 16:43:00 2003
From: rscarce at caci.com (Reed Scarce)
Date: Fri, 5 Dec 2003 19:43:00 -0500
Subject: [Rocks-Discuss]PXE and system images
Message-ID: <OFF783DCCA.8F016562-ON85256DF3.008001FC-85256DF7.00043E45@caci.com>

We want to initialize new hardware with a known good image from identical
hardware currently in use. The process imagined would be to PXE boot to a
disk image server, PXE would create a RAM system that would request the
system disk image from the server, which would push the desired system
disk image to the requesting system. Upon completion the system would be
available as a cluster member.

The lab configuration is a PC grade frontend with two 3Com 905s and a
single server grade cluster node with integrated Intel 82551 (10/100)(the
only PXE interface) and two integrated Intel 82546 (10/100/1000). The
cluster node is one of the stock of nodes for the expansion. The stock of
nodes have a Linux OS pre-installed, which would be eliminated in the
process.

Currently the node will PXE boot from the 10/100 and pickup an
installation boot from one of the g-bit interfaces. From there kickstart
wants to take over.

Any recommendations how to get kickstart to push an image to the disk?

Thanks,

Reed Scarce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-
discussion/attachments/20031205/dad04521/attachment-0001.html

From wyzhong78 at msn.com Mon Dec 8 05:36:37 2003
From: wyzhong78 at msn.com (zhong wenyu)
Date: Mon, 08 Dec 2003 21:36:37 +0800
Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up
Message-ID: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com>

Hi,everyone!
I have installed Rocks 3.0.0 with default options successful,there was not
any trouble.But I boot it up,it stopped at beginning,just show "GRUB" on

the screen and waiting...
Thanks for your help!

_________________________________________________________________
???? MSN Explorer: http://explorer.msn.com/lccn/

From daniel.kidger at quadrics.com Mon Dec 8 09:54:53 2003
From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com)
Date: Mon, 8 Dec 2003 17:54:53 -0000
Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)
Message-ID: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com>

Dear all,
Previously I have been installing a custom kernel on the compute nodes
with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix grub.conf).

However I am now trying to do it the 'proper' way. So I do (on :
# cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm
/home/install/rocks-dist/7.3/en/os/i386/force/RPMS
# cd /home/install
# rocks-dist dist
# SSH_NO_PASSWD=1 shoot-node compute-0-0

Hence:
# find /home/install/ |xargs -l grep -nH qsnet
shows me that hdlist and hdlist2 now contain this RPM. (and indeed If I duplicate
my rpm in that directory rocks-dist notices this and warns me.)

However the node always ends up with "2.4.20-20.7smp" again.
anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing kernel-
smp-2.4.20-20.7."

So my question is:
It looks like my RPM has a name that Rocks doesn't understand properly.
What is wrong with my name ?
and what are the rules for getting the correct name ?
(.i686.rpm is of course correct, but I don't have -smp. in the name Is this
the problem ?)

cf. Greg Bruno's wisdom:
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html

Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505
----------------------- www.quadrics.com --------------------

>

From DGURGUL at PARTNERS.ORG Mon Dec 8 11:09:27 2003
Date: Mon, 8 Dec 2003 14:09:27 -0500

Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15840@phsexch7.mgh.harvard.edu>

I just did "cluster-fork -Uvh /sourcedir/ganglia-python-3.0.1-2.i386.rpm" and
then "cluster-fork service gschedule restart" (not sure I had to do the last).
I also put 3.0.1-2 and restarted gschedule on the frontend.

Now I run "cluster-fork --mpd w".

I currently have a user who ssh'd to compute-0-8 from the frontend and one who
ssh'd into compute-0-17 from the front end.

But the return shows the users on lines for 17 (for the user on 0-8) and 10 (for
the user on 0-17):

17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, 0.03
17: USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s -bash

10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s -bash

When I do "cluster-fork w" (without the --mpd) the users show up on the correct
nodes.

Do the numbers on the left of the -mpd output correspond to the node names?

Thanks.

Dennis

Dennis J. Gurgul
Research Management
617.724.3169

Date: Mon, 8 Dec 2003 14:28:30 -0500

Maybe this is a better description of the "strangeness".

I did "cluster-fork --mpd hostname":

1: compute-0-0.local


Dennis J. Gurgul
Research Management
617.724.3169

From: npaci-rocks-discussion-admin at sdsc.edu
[mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,
Dennis J.
Sent: Monday, December 08, 2003 2:09 PM
To: npaci-rocks-discussion at sdsc.edu

I just did "cluster-fork -Uvh /sourcedir/ganglia-python-3.0.1-2.i386.rpm"
and
then "cluster-fork service gschedule restart" (not sure I had to do the
last).
I also put 3.0.1-2 and restarted gschedule on the frontend.

Now I run "cluster-fork --mpd w".

I currently have a user who ssh'd to compute-0-8 from the frontend and one
who
ssh'd into compute-0-17 from the front end.

But the return shows the users on lines for 17 (for the user on 0-8) and 10
(for
the user on 0-17):

17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s -bash

10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s -bash

When I do "cluster-fork w" (without the --mpd) the users show up on the
correct
nodes.

Do the numbers on the left of the -mpd output correspond to the node names?

Thanks.

Dennis

Dennis J. Gurgul
Research Management
617.724.3169

Date: Mon, 08 Dec 2003 12:35:16 -0800 (PST)
In-Reply-To:
<OFF783DCCA.8F016562-ON85256DF3.008001FC-85256DF7.00043E45@caci.com>

On Fri, 5 Dec 2003, Reed Scarce wrote:

> We want to initialize new hardware with a known good image from identical
> hardware currently in use. The process imagined would be to PXE boot to a
> disk image server, PXE would create a RAM system that would request the
> system disk image from the server, which would push the desired system
> disk image to the requesting system. Upon completion the system would be
> available as a cluster member.
>
> The lab configuration is a PC grade frontend with two 3Com 905s and a
> single server grade cluster node with integrated Intel 82551 (10/100)(the
> only PXE interface) and two integrated Intel 82546 (10/100/1000). The
> cluster node is one of the stock of nodes for the expansion. The stock of
> nodes have a Linux OS pre-installed, which would be eliminated in the
> process.
>
> Currently the node will PXE boot from the 10/100 and pickup an
> installation boot from one of the g-bit interfaces. From there kickstart
> wants to take over.
>
> Any recommendations how to get kickstart to push an image to the disk?

This sounds like you want to use Oscar instead of ROCKS.

http://oscar.openclustergroup.org/tiki-index.php

I'm not exactly sure why you think that the kickstart process won't give
you exactly the same image on ever machine. If the hardware is the same,
you'll get the same image on each machine.

We have boxes with the same setup, 10/100 PXE, and then dual gigabit. Our
method for installing ROCKS on this type of hardware is the following

1) Run insert-ethers and choose "manager" type of node.
2) Connect all the PXE interfaces to the switch and boot them all. Do not
connect the gigabit interface
3) Once all of the nodes have PXE booted, exit insert-ethers. Start
insert-ethers again and this time choose compute node
4) Hook up the gigabit interface and the PXE interface to your nodes. All

of your machines will now install.
5) In our case, we now quickly disconnect the PXE interface because we
don't want to have the machine continually install. The real ROCKS
method would have you choose (HD/net) for booting in the BIOS, but if you
already
have an OS on your machine, you would have to go into the BIOS twice
before the compute nodes were installed. We disable rocks-grub and just
connect up the PXE cable if we need to reinstall.

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support

Date: Mon, 08 Dec 2003 12:42:23 -0800 (PST)
In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com>

On Mon, 8 Dec 2003 daniel.kidger at quadrics.com wrote:

I've gotten confused from time to time as to where to place custom RPMS
(it's changed between releases), so my not-so-clean method is to just rip
out the kernels in /home/install/rocks-dist/7.3/en/os/i386/Redhat/RPMS
and drop my own in. Then do a

cd /home/install
rocks-dist dist
shoot-node

You are probably running into an issue where the "force" directory is more
of an "in addition to" directory and your 2.4.18 kernel is being noted,
but ignored since the 2.4.20 kernel is newer. I assume you nodes get both
and SMP and UP version of 2.4.20 and that your custom 2.4.18 is nowhere to
be found on the compute node.

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support

> Previously I have been installing a custom kernel on the compute nodes
> with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix grub.conf).
>
> However I am now trying to do it the 'proper' way. So I do (on :
> # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm
> /home/install/rocks-dist/7.3/en/os/i386/force/RPMS
> # cd /home/install
> # rocks-dist dist
> # SSH_NO_PASSWD=1 shoot-node compute-0-0
>
> Hence:
> # find /home/install/ |xargs -l grep -nH qsnet

> shows me that hdlist and hdlist2 now contain this RPM. (and indeed If I duplicate
my rpm in that directory rocks-dist notices this and warns me.)
>
> However the node always ends up with "2.4.20-20.7smp" again.
> anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing
kernel-smp-2.4.20-20.7."
>
> So my question is:
> It looks like my RPM has a name that Rocks doesn't understand properly.
> What is wrong with my name ?
> and what are the rules for getting the correct name ?
> (.i686.rpm is of course correct, but I don't have -smp. in the name Is this
the problem ?)
>
> cf. Greg Bruno's wisdom:
> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html
>
>
> Yours,
> Daniel.

From fds at sdsc.edu Mon Dec 8 12:51:12 2003
Date: Mon, 8 Dec 2003 12:51:12 -0800
In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu>
References: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu>
Message-ID: <423D0494-29C0-11D8-804D-000393A4725A@sdsc.edu>

You are right, and I think this is a shortcoming of MPD. There is no
obvious way to force the MPD numbering to correspond to the order the
nodes were called out on the command line (cluster-fork --mpd actually
makes a shell call to mpirun and it calls out all the node names
explicitly). MPD seems to number the output differently, as you found
out.

So mpd for now may be more useful for jobs that are not sensitive to
this. If enough of you find this shortcoming to be a real annoyance, we
could work on putting the node name label on the output by explicitly
calling "hostname" or similar.

Good ideas are welcome :)
-Federico

On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote:

> Maybe this is a better description of the "strangeness".
>
> I did "cluster-fork --mpd hostname":
>
> 1: compute-0-0.local

>
> Dennis J. Gurgul
> Partners Health Care System
> Research Management
> Research Computing Core
> 617.724.3169
>
>
> From: npaci-rocks-discussion-admin at sdsc.edu
> [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,
> Dennis J.
> Sent: Monday, December 08, 2003 2:09 PM
> Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
>
>
> I just did "cluster-fork -Uvh
> /sourcedir/ganglia-python-3.0.1-2.i386.rpm"
> and
> then "cluster-fork service gschedule restart" (not sure I had to do the
> last).
> I also put 3.0.1-2 and restarted gschedule on the frontend.
>
> Now I run "cluster-fork --mpd w".
>
> I currently have a user who ssh'd to compute-0-8 from the frontend and
> one
> who
> ssh'd into compute-0-17 from the front end.
>
> But the return shows the users on lines for 17 (for the user on 0-8)
> and 10
> (for
> the user on 0-17):
>
> 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00,
> 0.03
> 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU
> WHAT
> 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s
> -bash
>

> 0.07
> WHAT
> 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s
> -bash
>
> When I do "cluster-fork w" (without the --mpd) the users show up on the
> correct
> nodes.
>
> Do the numbers on the left of the -mpd output correspond to the node
> names?
>
> Thanks.
>
> Dennis
>
> Dennis J. Gurgul
> 617.724.3169
>
Federico


Date: Mon, 8 Dec 2003 15:55:13 -0500

Thanks.

On a related note, when I did "cluster-fork service gschedule restart" gschedule
started with the "OK" output, but then the fork process hung on each node and I
had to ^c out for it to go on to the next node.

I tried to ssh to a node and then did the gschedule restart. Even then, after I
tried to "exit" out of the node, the session hung and I had to log back in and
kill it from the frontend.

Dennis J. Gurgul
Research Management
617.724.3169

From: Federico Sacerdoti [mailto:fds at sdsc.edu]
Sent: Monday, December 08, 2003 3:51 PM
To: Gurgul, Dennis J.
Subject: Re: [Rocks-Discuss]cluster-fork --mpd strangeness

You are right, and I think this is a shortcoming of MPD. There is no
obvious way to force the MPD numbering to correspond to the order the
nodes were called out on the command line (cluster-fork --mpd actually
makes a shell call to mpirun and it calls out all the node names
explicitly). MPD seems to number the output differently, as you found
out.

So mpd for now may be more useful for jobs that are not sensitive to
this. If enough of you find this shortcoming to be a real annoyance, we
could work on putting the node name label on the output by explicitly
calling "hostname" or similar.

Good ideas are welcome :)
-Federico

On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote:

> Maybe this is a better description of the "strangeness".
>
> I did "cluster-fork --mpd hostname":
>
>
> Dennis J. Gurgul
> 617.724.3169
>
>
> From: npaci-rocks-discussion-admin at sdsc.edu
> [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,
> Dennis J.

> Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
>
>
> I just did "cluster-fork -Uvh
> /sourcedir/ganglia-python-3.0.1-2.i386.rpm"
> and
> then "cluster-fork service gschedule restart" (not sure I had to do the
> last).
> I also put 3.0.1-2 and restarted gschedule on the frontend.
>
> Now I run "cluster-fork --mpd w".
>
> I currently have a user who ssh'd to compute-0-8 from the frontend and
> one
> who
> ssh'd into compute-0-17 from the front end.
>
> But the return shows the users on lines for 17 (for the user on 0-8)
> and 10
> (for
> the user on 0-17):
>
> 0.03
> WHAT
> 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s
> -bash
>
> 0.07
> WHAT
> 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s
> -bash
>
> When I do "cluster-fork w" (without the --mpd) the users show up on the
> correct
> nodes.
>
> Do the numbers on the left of the -mpd output correspond to the node
> names?
>
> Thanks.
>
> Dennis
>
> Dennis J. Gurgul
> 617.724.3169
>
Federico


From mjk at sdsc.edu Mon Dec 8 12:58:22 2003

Date: Mon, 8 Dec 2003 12:58:22 -0800
In-Reply-To: <Pine.LNX.4.44.0312081226270.19031-100000@scorpion.emsl.pnl.gov>
References: <Pine.LNX.4.44.0312081226270.19031-100000@scorpion.emsl.pnl.gov>
Message-ID: <4261C250-29C1-11D8-AECB-000A95DA5638@sdsc.edu>

On Dec 8, 2003, at 12:35 PM, Tim Carlson wrote:

> 5) In our case, we now quickly disconnect the PXE interface because we
> don't want to have the machine continually install. The real ROCKS
> method would have you choose (HD/net) for booting in the BIOS, but
> if you already
> have an OS on your machine, you would have to go into the BIOS twice
> before the compute nodes were installed. We disable rocks-grub and
> just
> connect up the PXE cable if we need to reinstall.
>

For most boxes we've seen that support PXE there is an option to hit
<F12> to force a network PXE boot, this allows you to force a PXE even
when a valid OS/Boot block exists on your hard disk. If you don't have
this you do indeed need to go into BIOS twice -- a pain.

-mjk

From fds at sdsc.edu Mon Dec 8 13:26:46 2003
Date: Mon, 8 Dec 2003 13:26:46 -0800
In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu>
References: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu>
Message-ID: <39CC5B05-29C5-11D8-804D-000393A4725A@sdsc.edu>

I've seen this before as well. I believe it has something to do with
the way the color "[ OK ]" characters are interacting with the ssh
session from the normal cluster-fork. We have yet to characterize this
bug adequately.

-Federico

On Dec 8, 2003, at 12:55 PM, Gurgul, Dennis J. wrote:

> Thanks.
>
> On a related note, when I did "cluster-fork service gschedule restart"
> gschedule
> started with the "OK" output, but then the fork process hung on each
> node and I
> had to ^c out for it to go on to the next node.
>
> I tried to ssh to a node and then did the gschedule restart. Even
> then, after I
> tried to "exit" out of the node, the session hung and I had to log
> back in and
> kill it from the frontend.

>
>
> Dennis J. Gurgul
> 617.724.3169
>
>
> From: Federico Sacerdoti [mailto:fds at sdsc.edu]
> To: Gurgul, Dennis J.
> Subject: Re: [Rocks-Discuss]cluster-fork --mpd strangeness
>
>
> You are right, and I think this is a shortcoming of MPD. There is no
> obvious way to force the MPD numbering to correspond to the order the
> nodes were called out on the command line (cluster-fork --mpd actually
> makes a shell call to mpirun and it calls out all the node names
> explicitly). MPD seems to number the output differently, as you found
> out.
>
> So mpd for now may be more useful for jobs that are not sensitive to
> this. If enough of you find this shortcoming to be a real annoyance, we
> could work on putting the node name label on the output by explicitly
> calling "hostname" or similar.
>
> Good ideas are welcome :)
> -Federico
>
> On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote:
>
>> Maybe this is a better description of the "strangeness".
>>
>> I did "cluster-fork --mpd hostname":
>>
>> 1: compute-0-0.local

>>
>> Dennis J. Gurgul
>> Partners Health Care System
>> Research Management
>> Research Computing Core
>> 617.724.3169
>>
>>
>> -----Original Message-----
>> From: npaci-rocks-discussion-admin at sdsc.edu
>> [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,
>> Dennis J.
>> Sent: Monday, December 08, 2003 2:09 PM
>> To: npaci-rocks-discussion at sdsc.edu
>> Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
>>
>>
>> I just did "cluster-fork -Uvh
>> /sourcedir/ganglia-python-3.0.1-2.i386.rpm"
>> and
>> then "cluster-fork service gschedule restart" (not sure I had to do
>> the
>> last).
>> I also put 3.0.1-2 and restarted gschedule on the frontend.
>>
>> Now I run "cluster-fork --mpd w".
>>
>> I currently have a user who ssh'd to compute-0-8 from the frontend and
>> one
>> who
>> ssh'd into compute-0-17 from the front end.
>>
>> But the return shows the users on lines for 17 (for the user on 0-8)
>> and 10
>> (for
>> the user on 0-17):
>>
>> 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00,
>> 0.03
>> 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU
>> WHAT
>> 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s
>> -bash
>>
>> 10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04,
>> 0.07
>> 10: USER TTY FROM LOGIN@ IDLE JCPU PCPU
>> WHAT
>> 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s
>> -bash
>>
>> When I do "cluster-fork w" (without the --mpd) the users show up on
>> the
>> correct
>> nodes.
>>
>> Do the numbers on the left of the -mpd output correspond to the node
>> names?

>>
>> Thanks.
>>
>> Dennis
>>
>> Dennis J. Gurgul
>> Partners Health Care System
>> Research Management
>> Research Computing Core
>> 617.724.3169
>>
> Federico
>
> Rocks Cluster Group, San Diego Supercomputing Center, CA
>
Federico


From bruno at rocksclusters.org Mon Dec 8 15:31:08 2003
Date: Mon, 8 Dec 2003 15:31:08 -0800
Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up
In-Reply-To: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com>
References: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com>
Message-ID: <9979F090-29D6-11D8-9715-000A95C4E3B4@rocksclusters.org>

> I have installed Rocks 3.0.0 with default options successful,there was
> not any trouble.But I boot it up,it stopped at beginning,just show
> "GRUB" on the screen and waiting...

when you built the frontend, did you start with the rocks base CD then
add the HPC roll?

- gb

From bruno at rocksclusters.org Mon Dec 8 15:37:46 2003
Date: Mon, 8 Dec 2003 15:37:46 -0800
In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com>
References: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com>
Message-ID: <8700A2BE-29D7-11D8-9715-000A95C4E3B4@rocksclusters.org>

> Previously I have been installing a custom kernel on the compute
> nodes
> with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix
> grub.conf).
>
> However I am now trying to do it the 'proper' way. So I do (on :
> # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm
> /home/install/rocks-dist/7.3/en/os/i386/force/RPMS
> # cd /home/install
> # rocks-dist dist
> # SSH_NO_PASSWD=1 shoot-node compute-0-0

>
> Hence:
> # find /home/install/ |xargs -l grep -nH qsnet
> shows me that hdlist and hdlist2 now contain this RPM. (and indeed If
> I duplicate my rpm in that directory rocks-dist notices this and warns
> me.)
>
> However the node always ends up with "2.4.20-20.7smp" again.
> anaconda-ks.cfg contains just "kernel-smp" and install.log has
> "Installing kernel-smp-2.4.20-20.7."
>
> So my question is:
> It looks like my RPM has a name that Rocks doesn't understand
> properly.
> What is wrong with my name ?
> and what are the rules for getting the correct name ?
> (.i686.rpm is of course correct, but I don't have -smp. in the
> name Is this the problem ?)

the anaconda installer looks for kernel packages with a specific format:

kernel-<kernel ver>-<redhat ver>.i686.rpm

and for smp nodes:

kernel-smp-<kernel ver>-<redhat ver>.i686.rpm

we have made the necessary patches to files under /usr/src/linux-2.4 in
order to produce redhat-compliant kernels. see:

http://www.rocksclusters.org/rocks-documentation/3.0.0/customization-
kernel.html

also, would you be interested in making your changes for the quadrics
interconnect available to the general rocks community?

- gb

From purikk at hotmail.com Mon Dec 8 20:23:35 2003
From: purikk at hotmail.com (purushotham komaravolu)
Date: Mon, 8 Dec 2003 23:23:35 -0500
Subject: [Rocks-Discuss]AMD Opteron
References: <200312082001.hB8K1KJ24139@postal.sdsc.edu>
Message-ID: <BAY1-DAV65Bp80SiEmA00005c14@hotmail.com>

Hello,
I am a newbie to ROCKS cluster. I wanted to setup clusters on
32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel and
AMD).
I found the 64-bit download for Intel on the website but not for AMD. Does
it work for AMD opteron? if not what is the ETA for AMD-64.
We are planning to but AMD-64 bit machines shortly, and I would like to
volunteer for the beta testing if needed.
Thanks
Regards,
Puru

2003 December

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

En vedette

En vedette (7)

Similaire à 2003 December

Similaire à 2003 December (20)

2003 December