In close collaboration with Seoul National University's Structural Complexity Laboratory

 

This is an old revision of the document!


Booting SC Lab Machines

In case anyone needs to boot the machines, here is what needs to be done:

SC

sc is the top machine in the rack

  1. Check that the external backup disk is turned on (it might not restart after a crash) - sc can boot without the backup disk, but it won't be able to backup properly if the disk is turned on after it boots
  2. Boot the machine (this can take 15 minutes, and for the first 5 minutes there may not be anything showing on the screen while it does its memory check)
  3. Restart scteach (preferably by logging into sc on an nx console and running scteach in that) - normally, Bob will look after this

SC1

sc1 is the second bottom machine in the rack

  1. Turn on the external RAID array (the bottom machine in the rack) - sc1 _cannot boot without this_.
  2. Check that the external backup disk is turned on (it might not restart after a crash) - sc1 can boot without the backup disk, but it won't be able to backup properly if the disk is turned on after it boots
  3. Turn on at least one blade (it won't boot properly, but if none are booted, sc1's cluster control software won't come up properly)
  4. Boot the machine (this can take a long time, though not as long as sc)
    1. If, after about ten minutes, the system is still synching the RAID disks - all the RAID lights are flashing - then the boot has probably failed, and you should reboot again
      1. This problem might have been due to a flakey disk that eventually failed completely and had to be replaced, but we aren't completely sure
  5. Boot or reboot the blades once sc1 is running fully. You should do this even if pbsmon shows them as running.
  6. Run pbsmon (in the education menu on sc1) to check that all the blades boot properly
  7. Restart sc1a (preferably by logging into sc on an nx console and running scteach in that) - normally, Bob will look after this