Calendar

Availability Calendar

April 2007
S M T W T F S
 1    2    3    4    5    6    7  
 8    9    10   11   12   13   14 
 15   16   17   18   19   20   21 
 22   23   24   25   26   27   28 
 29   30           
             
 
May 2007
S M T W T F S
 1    2    3    4    5  
 6    7    8    9    10   11   12 
 13   14   15   16   17   18   19 
 20   21   22   23   24   25   26 
 27   28   29   30   31     
             
 
June 2007
S M T W T F S
 1    2  
 3    4    5    6    7    8    9  
 10   11   12   13   14   15   16 
 17   18   19   20   21   22   23 
 24   25   26   27   28   29   30 
             
July 2007
S M T W T F S
 1    2    3    4    5    6    7  
 8    9    10   11   12   13   14 
 15   16   17   18   19   20   21 
 22   23   24   25   26   27   28 
 29   30   31         
             
 
August 2007
S M T W T F S
 1    2    3    4  
 5    6    7    8    9    10   11 
 12   13   14   15   16   17   18 
 19   20   21   22   23   24   25 
 26   27   28   29   30   31   
             
 
September 2007
S M T W T F S
 1  
 2    3    4    5    6    7    8  
 9    10   11   12   13   14   15 
 16   17   18   19   20   21   22 
 23   24   25   26   27   28   29 
 30             
October 2007
S M T W T F S
 1    2    3    4    5    6  
 7    8    9    10   11   12   13 
 14   15   16   17   18   19   20 
 21   22   23   24   25   26   27 
 28   29   30   31       
             
 
November 2007
S M T W T F S
 1    2    3  
 4    5    6    7    8    9    10 
 11   12   13   14   15   16   17 
 18   19   20   21   22   23   24 
 25   26   27   28   29   30   
             
 
December 2007
S M T W T F S
 1  
 2    3    4    5    6    7    8  
 9    10   11   12   13   14   15 
 16   17   18   19   20   21   22 
 23   24   25   26   27   28   29 
 30   31           
January 2008
S M T W T F S
 1    2    3    4    5  
 6    7    8    9    10   11   12 
 13   14   15   16   17   18   19 
 20   21   22   23   24   25   26 
 27   28   29   30   31     
             
 
February 2008
S M T W T F S
 1    2  
 3    4    5    6    7    8    9  
 10   11   12   13   14   15   16 
 17   18   19   20   21   22   23 
 24   25   26   27   28   29   
             
 
March 2008
S M T W T F S
 1  
 2    3    4    5    6    7    8  
 9    10   11   12   13   14   15 
 16   17   18   19   20   21   22 
 23   24   25   26   27   28   29 
 30   31           
April 2008
S M T W T F S
 1    2    3    4    5  
 6    7    8    9    10   11   12 
 13   14   15   16   17   18   19 
 20   21   22   23   24   25   26 
 27   28   29   30       
             
 
May 2008
S M T W T F S
 1    2    3  
 4    5    6    7    8    9    10 
 11   12   13   14   15   16   17 
 18   19   20   21   22   23   24 
 25   26   27   28   29   30   31 
             
 
June 2008
S M T W T F S
 1    2    3    4    5    6    7  
 8    9    10   11   12   13   14 
 15   16   17   18   19   20   21 
 22   23   24   25   26   27   28 
 29   30           
             

    Full System Full Day         Full System Partial Day         Notable Event    
    Partial System Full Day         Partial System Partial Day    

Current Events

Outage Details

Downtime for April 2007

StartEndComments
04 Apr
08:00
04 Apr
12:30
Replaced power supplies on two modules
Upgraded MOAB to version 5.1.0p1
09 Apr
00:50
09 Apr
02:30
System down. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
09 Apr
09:16
09 Apr
11:32
System panic due to memory problem. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
09 Apr
16:44
09 Apr
20:08
System down. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
09 Apr
22:05
10 Apr
02:22
System panic due to memory problem. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
10 Apr
07:30
10 Apr
09:52
System down due to memory problem. Identified and replaced problem component. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
12 Apr
00:03
12 Apr
10:53
System down. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
13 Apr
10:02
13 Apr
16:58
System down due to pump failure. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
24 Apr
16:48
24 Apr
18:36
System rebooted after a panic. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
25 Apr
08:00
25 Apr
13:32
During maintenance, the OS was upgraded to UNICOS/mp 3.1.30 and several DIMMs were replaced.


Downtime for May 2007

StartEndComments
02 May
22:12
03 May
03:30
System down due to I/O channel error. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
09 May
08:20
09 May
09:00
System upgraded to UNICOS/mp 3.1.31
14 May
13:40
14 May
15:57
Power fluctuation caused the system to go down. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
14 May
23:08
15 May
00:48
System rebooted after hardware failure caused a panic. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
19 May
13:47
19 May
16:05
System rebooted after hardware failure caused a panic. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
23 May
08:15
23 May
12:30
During maintenance, four fan speed controllers were replaced.
31 May
10:55
31 May
12:15
System panic due to scalar TLB miss. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.


Downtime for June 2007

StartEndComments
03 Jun
12:00
03 Jun
15:35
Phoenix lost its connection to /spin. New logins were hanging and commands that accessed /spin were hanging as well. The system was rebooted and returned to service. Jobs running at the time of the reboot were killed; jobs in the queue (but not yet running) were not affected.
06 Jun
08:04
06 Jun
12:00
Scheduled maintenance.
06 Jun
21:32
07 Jun
10:50
A job failed due to a CRPE and this caused PBS to hang. The system was rebooted and returned to production use.
13 Jun
21:10
14 Jun
02:30
System crashed due to site power outage. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
18 Jun
20:10
19 Jun
09:37
The system hung requiring a system reboot.
20 Jun
08:00
20 Jun
11:38
Scheduled maintenance. Replaced a power supply.
27 Jun
08:06
27 Jun
12:05
Replaced a pump during scheduled maintenance.
28 Jun
22:00
29 Jun
10:30
System stopped processing jobs and was rebooted. Jobs in a run state at the time of the outage were killed. Jobs in the queue (but not yet running) were not affected.


Downtime for July 2007

StartEndComments
03 Jul
01:43
03 Jul
09:39
System stopped running jobs and was rebooted. During the outage, one power supply was replaced.
10 Jul
11:10
10 Jul
12:15
System interaction was degraded leading to a system reboot. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
10 Jul
14:04
10 Jul
14:51
System interaction was degraded leading to a system reboot. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
18 Jul
14:30
18 Jul
15:40
System was rebooted to clear a job that was hung in an exiting state. Scheduling was stopped and jobs that were running were allowed to complete prior to the reboot.
24 Jul
00:16
24 Jul
02:13
System crashed due to CRPE. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
24 Jul
11:46
24 Jul
13:08
System became unresponsive and was rebooted. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
28 Jul
03:01
28 Jul
04:05
System rebooted after a CRPE caused a system panic. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.


Downtime for August 2007

StartEndComments
08 Aug
08:00
08 Aug
12:00
Maintenance
15 Aug
08:10
15 Aug
11:10
Scheduled maintenance
22 Aug
08:00
22 Aug
11:40
Scheduled maintenance.
A power supply was replaced and maintenance was performed on the cooling system in one of the cabinets.
22 Aug
23:57
23 Aug
00:49
System crashed due to Kernel Mode Processor Parity Error. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
29 Aug
08:00
29 Aug
11:38
During scheduled maintenance, replaced two power supplies and performed maintenance on one of the cabinets.
29 Aug
08:00
29 Aug
12:00
Scheduled maintenance


Downtime for September 2007

StartEndComments
05 Sep
08:00
05 Sep
12:00
System Maintenance
05 Sep
08:00
05 Sep
13:08
Performed maintenance on one of the cabinets and replaced a power supply.
08 Sep
05:00
08 Sep
15:30
System was unavailable while NFS mounted directories were moved to a new server.
10 Sep
02:00
10 Sep
03:47
System became unresponsive and was rebooted. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
12 Sep
08:00
12 Sep
11:50
Scheduled maintenance.
25 Sep
12:40
25 Sep
15:04
System became unresponsive and was rebooted. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
26 Sep
08:00
26 Sep
11:40
System maintenance. During maintenance, the default Programming Environment was changed to 5.6.0.3. NOTE: On the cross-compilers, the Programming Environment module is named 'PrgEnv-x1', while on phoenix it is still named 'PrgEnv'.


Downtime for October 2007

StartEndComments
03 Oct
08:00
03 Oct
11:45
Hardware maintenance.
04 Oct
12:35
04 Oct
14:30
System crashed due to a bad memory controller on one of the modules. The module was replaced and the system returned to service. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
10 Oct
08:00
10 Oct
12:16
Hardware Maintenance
17 Oct
08:00
17 Oct
12:40
Replaced a memory module and performed maintenance on one of the cabinets.
20 Oct
20:33
20 Oct
21:27
System panic due to a processor parity error. System was rebooted and returned to service. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
24 Oct
08:00
24 Oct
12:30
Performed hardware maintenance and installed patches. The new cross-compiler (robin1) was made the default and the DNS for robin.ccs.ornl.gov was changed to point to robin1.
28 Oct
02:30
28 Oct
03:48
System stopped responding after /scratch/scr101 directory filled to 100%. The system was rebooted and a sweep was run on /scratch/scr101. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
29 Oct
00:00
29 Oct
00:00
Due to low utilization, the debug reservation will no longer be held.
31 Oct
08:10
31 Oct
12:45
Hardware maintenance


Downtime for November 2007

StartEndComments
07 Nov
08:00
07 Nov
11:45
Hardware maintenance
08 Nov
00:00
08 Nov
01:43
System lost connectivity with the NFS server and was rebooted. Jobs running at the time of the outage were killed; jobs in the queue (but not running) were not affected.
11 Nov
11:30
11 Nov
14:54
System crashed due to a processor parity error on the boot node. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected
13 Nov
12:02
13 Nov
12:55
Neither robin nor the CPES could mount scratch filesystems form phoenix. Phoenix was rebooted and the problem cleared. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
14 Nov
09:30
14 Nov
11:21
System crashed due to site power interruption. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
14 Nov
21:00
14 Nov
21:41
System crashed due to a Kernel Mode Processor Parity Error.
22 Nov
12:40
22 Nov
18:38
System rebooted to clear problems communicating with one of the NFS servers.
23 Nov
10:33
23 Nov
11:51
System rebooted after a panic. After rebooting, jobs that were in a Deferred state were released.
23 Nov
21:21
24 Nov
01:03
System rebooted due to Kernel Mode Processor Parity error.
26 Nov
20:00
26 Nov
21:28
Neither robin nor the CPES could mount scratch filesystems form phoenix. Phoenix was rebooted and the problem cleared. No jobs were running at the time of the outage. Jobs waiting to run were not affected.
27 Nov
14:48
27 Nov
16:21
System panic/crash due to a bad hardware module. The module was disabled and the system returned to service.
28 Nov
08:00
28 Nov
12:05
During system maintenance, replaced a DIMM and a hardware module. The system is once again running with all processors available.
30 Nov
04:05
30 Nov
05:34
System rebooted due to CRPE hardware errors. The cause of the errors is under investigation.


Downtime for December 2007

StartEndComments
05 Dec
08:00
05 Dec
12:15
Replaced three hardware modules during scheduled maintenance.
08 Dec
11:00
08 Dec
23:00
Phoenix will be unavailable for login and batch processing during this time.
08 Dec
22:38
09 Dec
00:30
System rebooted due to a system panic.
12 Dec
07:53
12 Dec
10:31
During system maintenance, the OS was upgraded to UNICOS/mp 3.1.42
13 Dec
03:31
13 Dec
04:50
System rebooted after a panic.
13 Dec
10:09
13 Dec
11:33
System rebooted after a panic.
17 Dec
13:16
17 Dec
16:15
System crashed due to an error on a hardware module. This module was replaced and the system was returned to service.
22 Dec
13:20
22 Dec
14:15
System rebooted to clear problems accessing scratch directories on the CPES and robin1.
23 Dec
11:40
23 Dec
13:10
System rebooted to clear issues accessing some filesystems
22 Dec
14:15
23 Dec
15:30
System was unavailable for login and batch processing during this time.
23 Dec
15:30
23 Dec
16:30
System rebooted and returned to general availability.
26 Dec
08:00
26 Dec
12:09
Replaced a pump during scheduled maintenance.


Downtime for January 2008

StartEndComments
02 Jan
15:20
02 Jan
18:30
Problems on several processors were preventing large jobs from starting. The system was rebooted to clear these problems.
09 Jan
10:28
09 Jan
12:42
System crashed due to hardware error.
11 Jan
09:40
11 Jan
12:00
System rebooted after a panic.
16 Jan
08:00
16 Jan
14:30
System maintenance.
23 Jan
08:00
23 Jan
11:20
Updated firmware on disk controllers during scheduled maintenance.
24 Jan
07:25
24 Jan
17:00
Phoenix crashed due to a site power problem.
25 Jan
12:30
25 Jan
16:55
System shut down due to work on the site power system.
28 Jan
21:13
28 Jan
22:44
System rebooted after a panic.
30 Jan
08:20
30 Jan
12:35
Updated firmware on a disk controller and replaced a power supply during scheduled maintenance.


Downtime for February 2008

StartEndComments
03 Feb
06:41
03 Feb
08:06
System rebooted after a panic.
04 Feb
02:51
04 Feb
04:29
System rebooted after a panic.
06 Feb
08:15
06 Feb
12:03
Updated disk controller firmware during scheduled maintenance.
13 Feb
08:00
13 Feb
11:30
Updated disk firmware during scheduled maintenance.
14 Feb
13:30
14 Feb
14:05
System rebooted to clear problems with NFS.
15 Feb
10:15
15 Feb
11:00
System rebooted to clear problems with NFS exports.
20 Feb
08:00
20 Feb
10:00
Replaced a power converter during scheduled maintenance.
23 Feb
05:30
23 Feb
07:30
System rebooted due to a software panic.
27 Feb
08:00
27 Feb
11:15
System maintenance


Downtime for March 2008

StartEndComments
05 Mar
08:00
05 Mar
12:00
The phoenix cross-compiler (robin) will be unavailable due to maintenance. Phoenix will remain in production.
12 Mar
08:00
12 Mar
14:10
During maintenance, replaced a memory module and upgraded disk firmware.
14 Mar
16:08
16 Mar
19:58
System unavailable due to site power outage.
19 Mar
08:00
19 Mar
15:10
System maintenance.
26 Mar
08:00
26 Mar
11:15
Replaced a pump during maintenance.


Downtime for April 2008

StartEndComments
02 Apr
08:00
02 Apr
11:50
During scheduled maintenance, repaired the network connection between phoenix and the CPES and upgraded the OS to UNICOS/mp 3.1.46
09 Apr
08:00
09 Apr
11:45
System maintenance
10 Apr
14:30
10 Apr
17:10
System became unresponsive. A DIMM was replaced and the system was returned to service
11 Apr
12:40
11 Apr
13:50
System rebooted after a panic.
17 Apr
14:22
17 Apr
17:56
System rebooted after a panic
23 Apr
08:00
23 Apr
11:55
System maintenance
25 Apr
11:47
25 Apr
15:45
System crashed due to hardware failure.
28 Apr
16:00
28 Apr
22:30
Phoenix and robin were not available for general use. Jobs running at the time of the outage were killed and rerun after the outage. If you had jobs running at the time please check your output to verify your job finished without any errors.
30 Apr
08:00
30 Apr
11:35
System maintenance


Downtime for May 2008

StartEndComments
05 May
21:00
05 May
21:45
System crashed due to Kernel Mode Processor Parity Error.
07 May
08:00
07 May
13:50
Scheduled Maintenance.
14 May
08:00
14 May
12:00
Scheduled Maintenance.


Downtime for June 2008

StartEndComments