Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / Wiki / Kb147

Kb147

Emulab FAQ: Testbed Operations: What should I do when nodes hang in

Emulab FAQ: Testbed Operations: What should I do when nodes hang in

Normally, when something seems to have gone wrong with nodes in the "reloading" experiment, the first thing to try is to:

	nfree emulab-ops reloading pc001 ...

which will "free" the nodes. This is equivalent to clicking on the "Free Node" button on the admin-mode "Node Information" page in the web interface.

Then the system will notice that they are dirty and move them back into reloading. This will serve to reset some of the infrastructure (e.g., make sure a frisbeed is running) and is often sufficient to clear up transient problems.

Here are some useful dianostic tools:

  • serial1:/usr/testbed/log/tiplogs/pc001.log - See if there is recent info in the console log.
  • boss:/usr/testbed/log/power.log - If nothing seems to be happening, is the outlets port setup for pc001? Once they go into reloading, they should be rebooted. Check to see if the system attempted to reboot or power-cycle the machines.
  • boss:/usr/testbed/log/dhcpd.log - Will tell you if the node tries to PXE boot.
  • boss:/usr/testbed/log/bootinfo.log - Will show you what boss tells the node to do (should be mfs boss:/tftpboot/frisbee .)
  • boss:/usr/testbed/log/tftp.log - Will show what the nodes are attempting to load. If all is working, they should be downloading files from /tftpboot/frisbee.
  • boss:/usr/testbed/log/frisbeed.log - Will show you if the node tries to JOIN and whether it eventually succeeds in loading the disk.
  • boss:/usr/testbed/log/tmcd.log - TMCD is the server-side of the client self-configuration process. Look in the log shows to see when it last reported in.
  • boss:/usr/testbed/log/stated.log - Stated is The Man that keeps tabs on everything that is happening.

Check frisbeed.log to see if the machines ever make a request to the frisbee server. If they downloaded from /tftpboot/frisbee, but never attempted to start frisbee, the frisbee startup script failed.

The most common cause of this is that the kernel in the frisbee MFS did not recognize your hard drive or that the machine has a different type of hard drive than indicated in the node_types table for that machine type; for example, node_types says "ad" (IDE) but the machines really have "da" (SCSI). You will need to look at the console output for one of the machines to determine this.