Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / Wiki / Kb71

Kb71

Emulab FAQ: Setting Up a New Emulab: Nodes not being reloaded

Emulab FAQ: Setting Up a New Emulab: Nodes not being reloaded

Free nodes are not getting reloaded.

The normal path for a node after it leaves an experiment (at swapout time) is that it is placed in the emulab-ops/reloadpending experiment where the "reload daemon" (/usr/testbed/sbin/reload_daemon) will notice it, move it into emulab-ops/reloading and then issue the appropriate os_load command.

If nodes are not being "freed" properly, there are a number of possible causes:

  • The reload daemon has died. Do a "ps" and see if /usr/testbed/sbin/reload_daemon is shown. If not, restart it.
  • The reload daemon is hung. If the reload daemon doesn't appear to be operating (e.g., there are nodes in emulab-ops/reloadpending for a long period), and "ps" reveals that the reload daemon is running, then it is probably stuck. At Utah, the most common sticking point is when it tries to do a power cycle of a node on one of the serial-line-controlled RPC power controllers. The capture proxy monitoring that serial line sometimes thinks it is busy and locks everyone out. The result is that calling power on a node connected to such a controller hangs forever waiting for the serial line to become free. If this happens, look at /usr/testbed/log/powermon.log which monitors the RPC power controllers and see if there is a message in there about timing-out on a particular controller. If so, go restart the capture process for that line.
  • Nodes are stuck in reloading. This happens due to a variety of reasons, mostly having to do with heavy load during the boot process. The reload daemon only makes a modest attempt at resuscitating these nodes. The easiest way to recover these is to just nfree emulab-ops reloading ''node ...''