| Current |
Posted by jkitchin,
Tue Sep 15 11:20:26 2009 |
numpy was upgraded today to 1.2.1. Let me know if you
have any problems. thanks.
|
| Current |
Posted by jkitchin,
Mon Aug 31 15:32:12 2009 |
you can see a daily report of your disk usage at
http://beowulf.cheme.cmu.edu/status/home.html.
Alternatively, from the main status page at
http://beowulf.cheme.cmu.edu/cgi-bin/status.cgi click
on the Home partition link.
|
| Current |
Posted by jkitchin,
Tue Aug 25 21:05:33 2009 |
The cluster is starting to show its age in terms of
maintainability. We are planning a major upgrade of
the cluster operating system to make it easier to
manage and expand the cluster in the future. The
upgrade will likely take place over Christmas break
this year. Please visit
http://kitchingroup.cheme.cmu.edu/beowulf-computer-clu
ster to read about our upgrade plans and to let us
know if there are particular needs you have. Thanks.
|
| |
Posted by root,
Fri Aug 7 16:32:01 2009 |
The cluster is scheduled to shut down at 11pm on
Saturday, August 8. It will be turned back on Monday
August 10. Sorry for the inconvenience, it is due to a
power outage.
|
| |
Posted by jkitchin,
Wed Jul 1 11:25:24 2009 |
the file server had to be rebooted today.
|
| |
Posted by root,
Wed Apr 8 10:43:01 2009 |
The queue should soon start killing jobs that exceed
their memory request by more than 10%. Please try to
estimate your memory needs accurately. Memory is
consumed by the queue, and your job will block other
jobs if you ask for too much memory. Jobs with lower
memory requests have a higher priority than larger
jobs.
|
| |
Posted by jkitchin,
Mon Apr 6 09:09:03 2009 |
the queue has been restarted. all running jobs were
terminated, but the jobs in the queue at the time of
the power outage are still in the queue.
|
| |
Posted by jkitchin,
Sun Apr 5 19:16:35 2009 |
an emergency power outage will cause the cluster to
shut down tonight.
|
| |
Posted by jkitchin,
Fri Feb 27 11:02:34 2009 |
The file server went down today, and it had to be
rebooted. the login node was also rebooted. The
running jobs should be ok.
|
| |
Posted by jkitchin,
Tue Dec 30 21:05:39 2008 |
The queue seems to be functioning. Feel free to try
submitting jobs now. There is new documentation at
http://beowulf.cheme.cmu.edu/matsim/. Please read it
before asking questions.
|
| |
Posted by jkitchin,
Mon Dec 22 10:56:11 2008 |
The new torque/maui system is functional now. You may
submit jobs to the queue. Everything is almost the
same as before: 1. Same queue names and policies 2. a
fair share policy has been implemented. This policy
will be tuned as needed to maximize cluster usage and
fairness. 3. Over the next few months new
documentation pages will be made to help you utilize
the queue system effectively. 4. Start at
http://beowulf.cheme.cmu.edu/beowulf/ 5. The cluster
monitoring pages are a little messed up for now
because the output of the torque commands is a little
different than the previous pbs commands.
|
| |
Posted by jkitchin,
Sun Dec 21 17:02:26 2008 |
the torque system has been installed and is minimally
functional. I am still trying to get Maui installed,
and the queue system is not yet stable. please do not
submit jobs yet.
|
| |
Posted by jkitchin,
Wed Dec 17 13:34:31 2008 |
the new queue system has temporarily broken the
web-based interface to the cluster
|
| |
Posted by jkitchin,
Wed Dec 17 13:34:05 2008 |
a new queue system is being installed. I don't
recommend submitting any jobs until you are officially
notified the queue is ready for use. if you want to
play around with it that is fine though.
|
| |
Posted by jkitchin,
Tue Dec 16 20:56:52 2008 |
The queue is not accepting jobs. The current queue
will be removed and a new queue installed in the next
few days.
|
| |
Posted by jkitchin,
Tue Dec 16 20:34:26 2008 |
The queue system is not accepting jobs anymore in
preparation for the installation of a new queue system
next week. All jobs will be killed Monday 12/22.
Hopefully the new queue system will be fully
functional by the end of next week.
|
| |
Posted by jkitchin,
Sun Nov 30 16:44:29 2008 |
the login node had some faulty memory and rebooted
several times today. all jobs inthe queue were lost.
sorry for the inconvenience. the memory has been
replaced and appears to be working again.
|
| |
Posted by jkitchin,
Tue Nov 25 18:44:15 2008 |
The login node had to be rebooted, which unfortunately
killed all the jobs in the queue. Sorry for the
inconvenience. The queues should be running and
accepting jobs now.
|
| |
Posted by jkitchin,
Mon Nov 24 12:55:54 2008 |
The cluster will be shutdown tonight due to a
scheduled power outage tomorrow morning. sorry for the
inconvenience.
|
| |
Posted by jkitchin,
Tue Nov 11 18:46:40 2008 |
the cluster is being shutdown in preparation for the
power outage tonight. The queues are disabled until
the cluster is restarted tomorrow.
|
| |
Posted by jkitchin,
Fri Sep 5 17:21:01 2008 |
the login node reboot messed up the queue system, and
all hte running and queued job info was lost. sorry.
|
| |
Posted by jkitchin,
Fri Sep 5 17:20:15 2008 |
the cluster
|
| |
Posted by jkitchin,
Fri Sep 5 12:28:59 2008 |
the login node was hung and had to be rebooted today.
i do not know why.
|
| |
Posted by jkitchin,
Thu Aug 7 07:29:42 2008 |
there will be another power outage tonight. The
cluster will be turned off at 6pm today and turned
back on tomorrow morning.
|
| |
Posted by jkitchin,
Fri Aug 1 14:07:42 2008 |
the queues are started again.
|
| |
Posted by jkitchin,
Fri Aug 1 12:56:46 2008 |
the queues have not been restarted. The nodes are
updating some software now. When that is done i will
restart the queues this afternoon.
|
| |
Posted by jkitchin,
Thu Jul 31 17:55:58 2008 |
the cluster is going down now! please log out.
|
| |
Posted by jkitchin,
Wed Jul 30 11:08:11 2008 |
Correction: the cluster will be shutdown thursday
evening JULY 31. all jobs in the queue will be cleared
at that time.
|
| |
Posted by jkitchin,
Fri Jul 25 08:53:59 2008 |
the power shutdown has again been moved. Now the
cluster will be shutdown Thursday evening, Aug 31. All
jobs in the quee will be cleared at that time before
it is restarted.
|
| |
Posted by jkitchin,
Wed Jul 23 14:29:31 2008 |
the power will be shutdown next tuesday, so i am
postponing restarting the queue system until then. I
still don't recommend you submit jobs unless you think
they will finish by then. the cluster will be shutdown
Monday evening.
|
| |
Posted by jkitchin,
Tue Jul 22 07:31:40 2008 |
the queues are all suspended now, pending a restart of
the queue system. Please do not submit any new jobs.
the queue system will be restarted on Thursday to give
existing jobs a chance to finish. sorry for the
inconvenience.
j
|
| |
Posted by jkitchin,
Wed Jul 16 07:39:29 2008 |
It appears the PBS system crashed last night. The
queue system is running again but the queues are
suspended while I investigate the cause.
|
| |
Posted by jkitchin,
Thu Jul 10 17:51:56 2008 |
The qstat command is working again. it is my fault
this happened, and there maybe other problems. let me
know if you have them.
j
|
| |
Posted by jkitchin,
Thu Jul 10 16:26:03 2008 |
there is a permissions problem with qstat right now. I
am investigating it.
|
| |
Posted by jkitchin,
Thu Jun 5 15:33:33 2008 |
we are in the process of bringing the cluster back
online. It should be up by trhe end of the week.
|
| |
Posted by jkitchin,
Sun Jun 1 13:32:48 2008 |
The cluster will be moved tomorrow. Shutdown will
start at around 8am Monday June 1. Power will be
rewired to teh cluster sometime this week, and after
that everything will be turned back on.
|
| |
Posted by jkitchin,
Wed May 28 08:25:46 2008 |
The cluster should be backup again. All jobs were lost
due to the power outage. Please resubmit them. All
servers should be on.
|
| |
Posted by jkitchin,
Tue May 27 18:33:46 2008 |
The cluster will be shut down this evening around
midnight due to a planned power outage tomorrow. It
will be turned back on tomorrow morning.
|
| |
Posted by jkitchin,
Fri May 23 08:53:37 2008 |
For some reason the cluster shut itself down
yesterday. I don't know why.
|
| |
Posted by jkitchin,
Wed May 21 17:56:28 2008 |
The cluster move is rescheduled again due to 3rd floor
renovations. The new shudown date is scheduled for
June 2. The nodes will be shutoff that morning and the
racks moved to the new cluster room. Once the
electrician rewires the power to the nodes and the
cooling is turned on the nodes will be returned to
service as they are rewired.
|
| |
Posted by jkitchin,
Tue May 13 14:37:24 2008 |
The cluster shutdown and move is currently postponed.
I am not sure when it will happen, but I hope it will
be around May 20 now.
|
| |
Posted by jkitchin,
Mon Apr 28 15:35:00 2008 |
The cluster move date has been changed to May 14 for
now. More information to come.
|
| |
Posted by jkitchin,
Fri Apr 25 08:20:55 2008 |
We plan to shut the cluster down at midnight May 5 in
preparation for moving it back to its renovated room.
It will take a few days before it is back up because
we have to schedule the electrician to reconnect power
to the cluster.
|
| |
Posted by jkitchin,
Tue Mar 11 11:18:03 2008 |
the login node was way overloaded today, and we had to
reboot it. unfortunately that also means we had to
kill all the jobs on the nodes to prevent new jobs
from being scheduled on nodes running old jobs. sorry
for the inconvenience.
|
| |
Posted by jkitchin,
Sun Mar 2 13:09:25 2008 |
The cluster is only partially up. The servers are
running, but there is insufficient cooling and power
to run many of the nodes right now. Hopefully this is
fixed Monday.
|
| |
Posted by jkitchin,
Fri Feb 29 08:24:21 2008 |
The cluster will be shut down in a few minutes. please
log out.
|
| |
Posted by jkitchin,
Fri Feb 15 08:11:10 2008 |
The cluster is planned to be shutdown early in the
morning on Friday Feb. 29. It will be moved to a new
location temporarily during the renovation. All jobs
will be killed at that time. Please copy all data you
want to a local machine as I am unsure how long it
will take to move the cluster and get it back up.
Hopefully only 2-3 days
|
| |
Posted by jkitchin,
Sat Jan 19 17:07:47 2008 |
All guest users should submit their jobs to the guest
queue. This is done by: qsub -q guest -l
cput=...,mem=... yourjob.sh Jobs found in other queues
will be removed.
|
| |
Posted by jkitchin,
Tue Jan 1 13:49:51 2008 |
You can submit jobs now.
|
| |
Posted by jkitchin,
Sat Dec 29 08:49:29 2007 |
Since no new jobs can run anyway, the queues are
disabled. After all jobs finish I will do some cluster
maintenance before re-enabling the queuees.
|
| |
Posted by jkitchin,
Sat Dec 29 08:46:46 2007 |
Jobs are not currently running because our PBS license
has expired. I requested new ones a few weeks ago and
hopefully they will arrive next week.
|
| |
Posted by jkitchin,
Sat Dec 15 13:14:15 2007 |
Cabinet 1 has been taken offline for maintenance. Jobs
will be continued to run until Monday Dec 17 to
complete. At that point any jobs running on Cabinet 1
nodes will be killed so maintenance can be done.
|
| |
Posted by jkitchin,
Thu Dec 13 20:40:41 2007 |
There was some difficulty restarting the PBS server
today. Some jobs may have been lost.
|
| |
Posted by root,
Sun Nov 25 14:47:21 2007 |
The belt on the main cooling unit failed last night,
resulting in a cluster shutdown. The belt has been
repaired and the cluster nodes are back up.
|
| |
Posted by root,
Sun Nov 25 11:25:13 2007 |
You can't delete suspended jobs right now because the
cluster nodes are not on. It is not clear FMS will fix
this today, so it may be late tomorrow that everythign
works again.
|
| |
Posted by root,
Sun Nov 25 10:35:12 2007 |
Please do not add more jobs to the queue system at
this time. Many jobs triggered the MPI status due to
their memory usage and will have to be deleted and the
queue restarted. It may happen that all of them get
deleted.
|
| |
Posted by root,
Sun Nov 25 10:32:31 2007 |
The main cooling unit is not on, so the cluster room
overheated and the cluster shutdown. We will call FMS
tomorrow to have it looked at.
|
| |
Posted by root,
Fri Oct 19 07:57:24 2007 |
Please log out immediately. We are performing
maintenance on the file server and any changes to your
home directory may not be saved.
|
| |
Posted by root,
Mon Oct 15 07:47:13 2007 |
On Saturday, October 20 Doherty Hall will not have
electrical power from 6am to 6pm. We will shut the
beowulf cluster down at midnight on Friday. Please do
not submit jobs that will run past this time.
|
| |
Posted by root,
Sun Oct 14 17:55:09 2007 |
PBS server was restarted again due to beowulf shutting
down after the the UPS was turned off. Sorry for the
inconvenience.
|
| |
Posted by root,
Thu Oct 11 17:23:38 2007 |
The PBS system had to be restarted Oct 11 after
beowulf appeared to be frozen. The reason is still
unknown, but the result is all jobs were lost. Please
resubmit them. Sorry for the inconvenience.
|
| |
Posted by steinhau,
Mon Jun 4 17:17:11 2007 |
Cabinets #1 and #2 have been shut down while the DH
renovation project affects the cluster room.
Hopefully, this will allow the rest of the computers
to be minimally affected by the temporary losses of
cooling seen and/or expected.
|
| |
Posted by steinhau,
Mon May 28 12:12:33 2007 |
The batch system was restarted at noon on Mon May 28
due to a fileserver error condition. All running jobs
were removed; sorry.
|
| |
Posted by steinhau,
Tue May 22 13:08:22 2007 |
FMS has added to the list of complete building
electrical outages as part of their tests. DH will be
without power 05/22 *and* 05/23 between 7pm and 3am.
All cluster machines will be shut down around 5pm today
(May 22). The servers will be restarted tomorrow
morning, but the nodes will not be put back online
again until Thu morning (May 24).
|
| |
Posted by steinhau,
Fri Dec 10 23:08:15 2004 |
A new qsub script has been implemented for batch jobs.
- all in-script PBS commands are ignored
- resource control is more strictly enforced
- a new '-vcpu' option allows for large memory jobs
type "qsub" to see the syntax; for details, see note #4 in
http://beowulf.cheme.cmu.edu/cgi-bin/unixhelp.cgi#chapter_09
|
| |
Posted by steinhau,
Mon Jul 19 11:32:45 2004 |
New doc for remote login access; see Chpt 2 of
http://beowulf.cheme.cmu.edu/cgi-bin/unixhelp.cgi
|