Status of Scientific Computing Services

The scientific computing group offers a couple of services. News on their status you can find on this page.

nfs was unavailable on SC clusters for about 5 minutes on 5.9.2019

On 5.9.2019 we had an unplanned reboot of our nfs server for the sc clusters galaxy, sirius and polaris from about 13:10-13:15. Please check your jobs and retransmit if necessary. We are sorry for any inconveniences.

Short nfs failure on 3.9.2019

We have seen the /nfs mount beeing unavailable on the SC cluster nodes on Galaxy, Sirius and Polaris on 3.9.2019 from about 17:20 until 17:45. You probably want to check your jobs running during this.

Maybe the problem was caused due to high metadata load on the nfs directory. We are sorry for any inconveniences.

galaxymaster02 was down (28.8.2019)

We encountered a network problem and galaxymaster02 went offline for about 2 hours on 28.8.2019. Hadoop and Slurm scheduler have not been directly affected, but accumulo-db and slurm accounting were not available.

State: resolved

Maintenance on the RStudio-Server: 27.+28.8.2019

We need to update our RStudio server and fix some Hardware issue during a maintenance period on 27.+28. August. The RStudio service will be not available during the maintenance, sorry for any inconveniences.

State: Done (28.8.2019, 8:30)

Minor Maintenance on SC clusters on 27.+28.8.2019

We neet to update the software on our sc cluster galaxy, sirius and polaris and update slurm. For this we are planning a minor maintenance on 27th and 28th august.

Hadoop: Users on the shared hadoop partition will notice short service breaks and running jobs could get interrupted.

Slurm: We are expecting the slurm partition to stay functional with only short interruptions of the availability of slurm commands like squeue or sbatch, but slurm jobs should keep running.

Galaxy101 gets restarted.

State: Done (2019-08-28, 10:30)
2019-08-22: Slurm nodes are updating packages when they are idle.
2019-08-27: Hadoop Nodes and Masters are updated.
2019-08-28: Galaxy101 is updated, slurm version is updated, a rest of slurm nodes is waiting for their update timeslot in normal operation.

Short Powerfailure affected galaxy on 31.7.2019

The galaxy cluster suffered from a short power outage on 31.7., a couple of minutes after 11am. All galaxy servers located in Leipzig got rebooted and running jobs got aborted.

letzte Änderung: 05.09.2019



