Skip to content. | Skip to navigation

Personal tools


You are here: Home / Wiki / Kb104


Emulab FAQ: Operations and Policy: How do I swapout a firewalled experiment that is paniced or failed to swapout?

Emulab FAQ: Operations and Policy: How do I swapout a firewalled experiment that is paniced or failed to swapout?

If you are not a testbed administrator, contact testbed-ops. This entry is for testbed admins.

Often times a transient problem will cause the swapout of a firewall experiment to fail and it will be up to you, Testbed-Ops Man, to tear it down. We tend to be very cautious and leave such failed experiment swapped in, but disabled in a couple of ways. These ways are generally enumerated in the failure mail:

    Swapout of firewalled experiment <PID>/<EID> by <UID> failed!
    Admin intervention required:

    Failed to <SOMETHING> on <NODES>.

    Current state of <NODES>:

    Firewall is NOT in place
    All nodes set to admin mode
    All nodes are powered off
    Firewall cnet interface <NODE>:<PORT> disabled

The ones that matter are whether the nodes have been powered off and whether the firewall control net interface has been disabled. To successfully swap the experiment out, you may need to do a variety of things.

  • Mark experiment as no longer "swapping". Occasionally, a failure will result in an experiment being left as "paniced" but in the swapping state. In this state there will be no menu item for swapping the experiment out. So you first need to go into the DB and fix up the state:
        update experiments set state='active' where pid='<PID>' and eid='<EID>';
  • Reenable the firewall node control net interface. If the failure message says "Firewall cnet interface ... disabled", then you will need to reenable the interface before re-trying the swapout. If you do not, cleanup of the firewall node can never succeed since it will never PXE boot. To reenable, use snmpit:
        wap snmpit -e <NODE>:<PORT>
    where NODE and PORT are as given in the error message; e.g., "pc1:4".
  • Power on the affected nodes. In most cases, you won't need to do this, since most experiments will go through the "zap boot blocks phase" whose first step is to power off the nodes anyway. However, if the experiment was at security level 1 ("Blue"), then nodes will not go through the power down step, and the regular node reload step will fail as the nodes cannot be power cycled (power cycling will not turn on a powered off node). So in this case, you will need to explicitly power on the machines with the power command:
        wap power on <NODES>
    where NODES is the list given in the failure message. Or if that list is too long to cut/paste, try:
        wap power on `wap node_list -e <PID>,<EID>`

Now you should be able to retry the swapout. If the swapout continues to fail due to either trying to boot into the admin MFS or when trying to run the boot block zapper, it is possible to avoid that step. If you do this, be aware that you are circumventing part of our security measures''' So don't do this unless you are confident that the nodes either are not contaminated or that you will personally assure that they will not boot from the disk. At any rate, you can avoid the disk zap step by doing the following:

  • Set the machines to boot into the admin MFS. We don't actually force them into the MFS right now. This step is just to ensure that once we power on the nodes in the following step, they will not automatically come up from the disk.
        wap node_admin -n on <NODES>
  • Clear the magic DB state. Mark the experiment as not paniced and at the lowest, firewall-enabled security level:
        update experiments set paniced=0,security_level=1
        where pid='<PID>' and eid='<EID>';
  • Perform standard cleanups. Reenable the firewall control net interface and power on the nodes:
        wap snmpit -e <NODE>:<PORT>
        wap power on <NODES>

And now you can redo the swapout. If things are still screwed up, talk to Mike.