Eventsystem
Emulab Event System Reference
Introduction
The Emulab event system provides a means for automating your experiments. The event system consists of several types of "agents" that implement some sort of functionality, such as running programs or generating traffic, and a scheduler that triggers the events at the appropriate time. When your experiment is swapped in, any agents specified in your NS file are automatically setup on the experimental nodes and the ops node. A short time after the experiment becomes active, "event time" begins to flow. As event time progresses, any events scheduled in the NS file for a particular time offset are sent to the appropriate agents. Alternatively, events can be sent at runtime using the tevc command from ops or an experimental node. For a detailed walkthrough of using the event system, see the advanced example.
Recently, we have added some experimental extensions to make the event system even more capable. Note that many of these features are subject to change and are only available when using the latest versions of the FBSD410-STD and RHL90-STD disk images. The NS file below gives an example of using these extensions to automate the process of creating disk images. First, it downloads a network traffic analyzer, iftop, then proceeds to build and install the software. Next, the source directory is removed and a snapshot is taken of the node's disk. Finally, after the snapshot completes and the node has finished rebooting, the experiment is swapped out.
set opt(VERSION) 0.16 set ns [new Simulator] source tb_compat.tcl set node [$ns node] tb-set-node-tarfiles $node \ /tmp http://www.ex-parrot.com/~pdw/iftop/download/iftop-$opt(VERSION).tar.gz set builder [$node program-agent -dir "/tmp/iftop-$opt(VERSION)"] set cleaner [$node program-agent] set build [$ns event-sequence { $builder run -command "./configure" $builder run -command "gmake" $builder run -command "sudo gmake install"}] set clean [$ns event-sequence { $cleaner run -command "sudo rm -rf /tmp/iftop-$opt(VERSION)" }] set doit [$ns event-sequence { $build run $clean run $node snapshot-to RHL90-CUSTOMIZED $ns swapout }] $ns at 0.0 "$doit start" $ns run
An example NS file that automates the process of installing software on a node and taking a snapshot of the disk image.
We also have a small package containing a more complicated experiment that runs BitTorrent on a bunch of nodes, collects their output, and generates a simple report on how they performed: BitTorrent experiment package
The rest of this document is intended as a reference manual for the available set of agents and the events they can handle.
NS "Simulator" Agent
Constructor: new Simulator
The simulator agent provides control over your Emulab experiment as a whole. The simulator agent listens for the following events:
- swapout - Swap out the experiment.
- terminate - Terminate the experiment. Warning: This event will completely destroy every trace of the experiment and there is no confirmation.
- report [-digester script] - Automatically generate and send a "report" e-mail to the user. Typically, this event should be sent at the end of an experimental "trial", when all of the data has been produced and it is time to gather, analyze, and archive the data. Gathering and archiving the data is handled by the loghole utility, which copies log files on the nodes to the experiment's log directory. Simple analysis can be done by specifying a "digester" script that processes the log files. Once all of the data processing has been finished, an e-mail will be sent to the user containing the following:
- The contents of any message events sent to the simulator agent.
- The output from the digester script.
- The captured NS file parameters
- Any log messages sent to the simulator, along with log messages automatically generated by the simulator.
- For any programs that exited with an error, a description of the command that failed and the tails of their standard error and output files.
- message string - Append a string to the head of the e-mail sent by the report event.
- log string - Append a log message to the tail of the e-mail sent by the report event.
NS Examples:
set ns [new Simulator] ... set doit [$ns event-sequence { $ns message "Testing one way, then the other..." $thisway run $thatway run $ns report }]
Example 1: Adds some text to the report e-mail, runs an application twice, and then sends a report to the user.
Event Sequence
Constructor: $ns event-sequence [body]
- body - The list of events to be sent. If none is specified, events can be added using the sequence's append method.
An event sequence agent is an ordered list of events, each of which is sent when the previous event in the list has reported its completion. For example, in a sequence consisting of a pair of events that run programs, the first event will be sent immediately and the second will be sent when the run of the first program completes. While running two programs in a row may be trivial using conventional means, this capability works across machines and can interact with other operations like reloading disks and rebooting machines.
The semantics of when an event "completes" depend on the type of agent and event. Many events complete instantaneously, such as those used to set a property, so the next event in the sequence is sent immediately. Other events that take a variable amount of time to complete, such as running a program. Some agents provide two types of events to support non-blocking and blocking operation, usually called start and run. Whereas the start event completes instantly, the run blocks the sequence until the agent is finished.
Event sequences listen for the following events:
- start, run - Begins the execution of the sequence. When the run event is used inside another sequence, this sequence will complete when the last event completes.
NS Examples:
set doit [$ns event-sequence { $prog0 run -command "setup.sh" $node0 reboot $prog0 run -command "test.sh" }]
Example 1: A sequence that performs some setup on a node, reboots it, and then starts the test.
set doit [$ns event-sequence { $serverprog start; # Start the server, $clientprogs run; # run the clients to completion, then $serverprog stop; # stop the server. }]
Example 2: A sequence that asynchronously starts a server, runs some clients, and finally, stops the server.
set testseq [$ns event-sequence] foreach test $tests { $testseq append "$prog0 run -command \"$test\"" }
Example 3: A sequence that is constructed incrementally instead of being fully specified in the constructor.
Event Timeline
Constructor: $ns event-timeline
An event timeline agent sends other events at a relative offset to the overall start time of the timeline. In other words, a timeline is a first class version of the existing "$ns at" syntax.
Event timelines listen for the following events:
- start, run - Starts the timeline. When run is used in a sequence, the timeline completes when it sends the last event.
NS Examples:
set tl [$ns event-timeline] $tl at 0s "$prog0 start" $tl at 15s "$prog0 stop" set seq [$ns event-sequence { $tl run $ns swapout }]
Example 1: A timeline that runs a program for 15 seconds and then swaps out the experiment.
Program Agent
Constructor: $node program-agent [-command cmdline] [-dir dir] [-timeout seconds] [-tag string] [-expected-exit-code code]
- -command "cmdline" - Specifies the command-line to run. Defaults to the last command that was run or the command specified in the NS file. See belowfor additional notes on command lines.
- -dir directory - Specifies the directory to run the command within. Defaults to the last directory that was specified, the directory in the NS file, or "/tmp".
- -timeout seconds - Specifies the timeout, in seconds, for the command or zero for no timeout. If the command does not complete before the timeout, it will be stopped forcefully. Defaults to the last timeout used for this agent or no timeout.
- -tag string - Specifies the symbolic tag to be attached to this invocation of the agent and its output log file names. By default, invocations are identified by a unique number, so this option allows the user to attach a more meaningful identifier.
- -expected-exit-code number - The expected exit code for the command, this value is compared against the actual exit code to determine whether or not the command completed successfully. Unsuccessful commands run by a sequence will cause the sequence to stop executing and also fail. Defaults to the last value used or zero.
Program agents listen for the following events:
- start, run [options] - Starts the program by running the command-line in the specified directory and capturing its standard output and error. The agent will then switch into "management" mode and only accept stop and kill events until the command terminates. The event accepts the same options as the constructor, so you can change the command to be run on the fly. The output from the command is stored in the "/local/logs" directory on the node. Each invocation of the agent is stored in a separate file tagged with a unique id, in addition, the stdout and stderr data are stored separately in ".out" and ".err" files. To make it easier to locate the last invocation of the agent, soft links are created with file names that lack the unique id (e.g. "prog0.out" -> "prog0.out.5"). If a "tag" is specified, a soft link will also be created that refers to the actual file (e.g. "prog0.baseline.out" -> "prog0.out.5"). The command will be executed with the following environment variables set:
Variable Description PATH The default path for binaries is set to the standard path (e.g. /usr/bin, /bin, /usr/sbin, /sbin), the binary directories in /usr/local, and the directory containing Emulab specific binaries. EXPDIR The experiment's directory in NFS space (e.g. /proj/foo/exp/bar). LOGDIR The preferred directory for log files on the local machine. USER The name of the user that swapped in this experiment. HOME The path to the user's home directory. GROUP The name of the unix group for the user that swapped in this experiment. PID The project ID for the experiment this agent is running within. EID The experiment ID for the experiment this agent is running within. NODECNET The fully-qualified name of the node this program agent is running on. This name resolves to the IP address of the control network interface of the node. NODECNETIP The IP address of the control network interface. This address should not be advertised to, or used by, applications within an experiment as it will cause all traffic to flow over the control network rather than the experimental network. NODE The unqualified name of the node this program agent is running on. For nodes with experimental interfaces, this name resolves to the IP address of an experimental interface on the node. For nodes with more than one experimental interface, there is no guarantee which one it will resolve to. For nodes with no experimental interfaces, the name will not resolve. NODEIP The IP address of the experiment network interface that NODE resolves to. For nodes with no experimental interfaces, this variable will not be set. set opt(VAR) values Any entries in the "opt" array of the NS file will automatically be added to the environment. For example, to set a variable named "DURATION" with a value of "100", you would add "set opt(DURATION) 100" to the top of your NS file. See captured parameters.
- stop - Stops the program, if it is currently running, by sending a SIGTERM to the process group.
- kill signal - Signals the program with the given signal name. For example, to send a SIGHUP to the process you would use "sighup", or for tevc, "SIGNAL=SIGHUP".
- set - Set the properties of the program agent, accepts the same arguments as the start event.
Notes on Command Lines
In general, if you have complicated or multiple commands to execute, it is best to put them in a script and specify the script name in -command. But if you insist, here are some things to be aware of.
The command line is executed with "csh -c." Yes, that is the Berkeley C-shell and not the Bourne shell or bash. Sorry, it is an historical thing. So be aware of differences in redirection and expansion syntax (e.g., ">&" and "{}"). When in doubt, put your command in a script and set the command line to "sh -c myscript.sh".
Quoting is fragile and happens at a couple of levels:
- Quoting for TCL. Putting curly braces ({...}) or double quotes ("...") around the entire command line will quote the string to TCL (i.e., the NS script parser language). Double quotes allows for TCL variable expansion, curly braces allow no expansion. Thus, these quotes will be stripped off before the command line is given to csh. Use this mechanism if your command line has white space (i.e., arguments to the command), otherwise the Emulab NS parser will flag an error.
- Quoting for csh. Recall that the program agent runs a shell to interpret your commands, so you may need additional quoting to get special characters past it. For example, if one of your command arguments has an embedded space, you will need to quote it with single or double quotes. Backslash quoting also works.
A sick example might look like this:
... -command {echo arg{1,2} "arg3 has spaces" arg4\ has\ \'\ \'\ too}
where the echo command would have four arguments:
arg1 arg2 arg3 has spaces arg4 has ' ' too
To summarize: put your commands in a script.
Other Notes:
- Many of the features described here are only available on recent FBSD{410,54,61}-STD, RHL90-STD, and FC4-STD disk images.
- This page currently only covers the agent at a high-level, you can find some more detail in the program-agent(8) man page on ops or an experimental node.
NS Examples:
set prog0 [$node0 program-agent] set prog1 [$node0 program-agent -command "/usr/bin/env"] set prog2 [$node0 program-agent -command "inf_loop_bug" -timeout 10] set prog3 [$node0 program-agent -command "ls" -dir "/foo/bar"]
Example 1: Creates four program agents with different default properties.
Event Group
Constructor: $ns event-group [list-of-agents]
- list-of-agents - A TCL list of the agents to be in the group.
The event group agent is used to broadcast events to a group of agents of the same type. For example, if you wanted to start a program on a large number of nodes at the same time, you can create a group consisting of those program-agents and send a single start event to the group. An event group can also act as a simple synchronization method when used inside an event-sequence. In this case, the next event in the sequence won't be sent until all of the agents in the group have signalled completion.
NS Examples:
set group [$ns event-group] for {set i 0} {$i < 4} {incr i} { set nodes($i) [$ns node] set progs($i) [$nodes($i) program-agent] $group add $progs($i) } set doit [$ns event-sequence { $group run -command "setup.sh" $group run -command "client.sh" }]
Example 1: Runs the "setup.sh" script on a group of nodes and when they have all completed, runs the "client.sh" script.
set group [$ns event-group [list $rnode $lnode]] set doit [$ns event-sequence { $group reboot $ns log "Reboot finished" }]
Example 2: Reboots a pair of nodes and logs a message with the simulator.
Node Agent
Constructor: $ns node
In addition to allocating an actual machine, the "$ns node" constructor will create a node agent so the node can be controlled from the event system.
Node agents listen for the following events:
- reboot - Reboot the node. When used in a sequence, this event will complete when the node has finished booting and is considered "up".
- snapshot-to imagename - Snapshot the node's disk into the given disk image. Before the snapshot is taken, the node's logs will be sync'd back to ops using the loghole utility and the "/local/logs" directory will be cleaned out. When used in a sequence, this event will complete when the snapshot has been taken and the node has finished booting and is considered "up".
- reload [-image imagename] - Reload the node's disk with the default image or the given image. When used in a sequence, this event will complete when the node has finished booting and is considered "up".
- setdest x y speed [-orientation degrees] - ([mobilewireless.php3 mobile] nodes only) This event will set the next physical destination for the node. When used in a sequence, this event will complete when the node has reached its destination. If another setdest event is sent to a node before it has reached its current destination, the new destination will overwrite the old one.
Console Agent
Constructor: $node console
Console agents operate on the serial consoles attached to some Emulab nodes. Currently, they only support capturing a slice of the output received on the serial line.
Console agents listen for the following events:
- start - Start recording the serial console output from a node.
- stop id - Stop recording the serial console output from a node and save it to a file named "agentname-id.log" in the experiment's log directory.
Traffic Generator
Traffic generation agents output network traffic at a constant bit rate over a link. Consult the advanced example for more information and examples of their use.
Traffic generators listen for the following events:
- start - Start sending traffic.
- stop - Stop sending traffic.
- set - Change characteristics of the traffic.
Disk Agent
Disk agent can be used to create and modify virtual disks on test nodes. Primary purpose of disk agent is to help experimenters test their applications for fault tolerance to disk failures/errors.
Disk agents listen for the following events:
- start/run - Mount a virtual disk with the given parameters. This is a simplified version of create/modify where users don't have to specify the geometry of virtual disk.
- create - Creates a virtual disk. But you need to specify the complete geometry of the virtual disk. You must mount the virtual disks manually.
- modify - Modifies a virtual disk properties. But you need to specify the complete geometry of the virtual disk. You must mount the virtual disks manually.
Notes
Disk-agent uses device mapper library to create/modify the virtual disks with different properties. The syntax is similar to dmsetup tool but it is no where close to the full set of features that dmsetup provides. The virtual disks supports these types:
- linear - linear type of disk is simply a 1:1 mapping of the sectors from virtual disk to the real disk.
- delay - Delay type of disk supports delaying disk I/O's by a specified number of milliseconds. This is useful to simulate slow disks.
- flakey - This target is the same as the linear target except that it returns I/O errors periodically. It's been found useful in simulating failing devices for testing purposes. Starting from the time the table is loaded, the device is available for <up interval> seconds, then returns errors for <down interval> seconds, and then this cycle repeats.
- error - Useful to designate a particular sector to have I/O errors.
Please refer to the dmsetup documentation page [here for more details. But note that not all features listed there are implemented here.
Constructor: $node disk-agent [-type ] [-size ] [-mountpoint ] [-parameters ] [-command ]
- -type "type" - Specifies the type of virtual disk which could be one from the above list.
- -size "size in MB" - Specifies the size of the disk to be created. It need not be specified and a default size is allocated.
- -mountpoint "directory" - Specifies the mountpoint to mount the virtual disk.
- -parameters "string" - Specifies the optional parameters that the type supports. For example, flakey type supports <up_interval> and <down_interval>, which are basically in seconds the disk returns IO errors. If you need 50% of your IO's to fail then parameters to flakey would be '1 1'.
- -command "cmdline" - Specifies the complete geometry of the disk. The general format is,
"<start sector> <size in sectors> <type> <device path> <offset> <additional parameters>". Example, "0 10000 flakey /dev/sdb 0 1 1". This option must be used with create or modify event type only. It will be ignored if the event type in the NS file is start/run. Using this option will not mount the disk and has to be done manually. The virtual disk appears as /dev/mapper/<name>. We can then use mkfs to create a filesystem on top and mount it somewhere to put it use.
- Use the FEDORA15-DAGENT image on emulab for now since it has disk-agent in it.
NS Examples:
Example 1: Creates a disk object disk0 and specifies "linear" type and size to be 1000 MB; disk0 starts out being a linear disk (good disk) and 20 seconds later starts giving IO errors.
set disk0 [$nodeA disk-agent -type "linear" -mountpoint "/mnt" -size 1000] $ns at 10 "$disk0 run" $ns at 20 "$disk0 run -type \"flakey\" -mountpoint \"/mnt\" -parameters \"1 1\""
Example 2: Creates a disk object disk0 but this time we specify the geometry. Note that if we use events create or modify, we expect that command line is defined and other fields are ignored. Using the same disk object with different type modifies the disk. Here, disk0 starts out by being a flaky disk and then becomes a slow disk where every IO takes 50ms to complete.
set disk0 [$nodeA disk-agent -command "0 10000 flakey /dev/sdb 0 1 1"] $ns at 10 "$disk0 create" $ns at 20 "$disk0 modify -command \"0 10000 delay /dev/sdb 0 50\""
Note: Please notice the usage of quotes above. In the "$ns at" syntax, we need to escape the quotes used inside the outer quotes.
We used -command option in example 2 above and a virtual disk with those properties is created. But we must manually run mkfs on top of /dev/mapper/disk0 (example above) and mount it somewhere for disk to be active.
Once a disk is turned into an error target, all the I/O's that disk will fail. This will interfere while converting 'error' target to others and it is best avoided.
Tevc Examples
We can dynamically create/modify virtual disks on nodes and inject errors with tevc.
Example 1: Create a virtual disk and mount it on a given mount point.
tevc -e experimentname now disk0 run disktype="linear" mountpoint="/mnt"
Example 2: Modify some property of a virtual disk. Lets say we want to change disk1 to a flakey disk.
tevc -e experimentname now disk0 run disktype="flakey" mountpoint="/mnt" parameters='1 1'" or tevc -e experimentname now disk0 run disktype="delay" mountpoint="/mnt" parameters="50"
Note
- If you specify an existing disk name with run event, then it is assumed that you want to modify the properties of that disk.
- You must quote the arguments while using tevc to accommodate spaces.
How to use Disk-agent with event type create/modify?
Basic syntax for create and modify is,
tevc -e experimentname now disk create/modify "<virtual disk name> <start sector> <size in sectors> <type> <device path> <offset> <additional parameters>"
Note: You must specify the complete geometry along with additional parameters that this particular type of virtual disk supports. The virtual disk appears as /dev/mapper/<name>. We can then use mkfs to create a filesystem on top and mount it somewhere to put it use.
Example 3:
Creating a linear type of virtual disk. More details here
tevc -e experimentname now disk0 create command="0 10000 linear /dev/sdb 0"
Creating a flakey type of virtual disk with 50% failure rate. More details here
tevc -e experimentname now disk0 create command="0 10000 flakey /dev/sdb 0 1 1"
Creating a delay type of virtual disk which delays both read and write I/O's by 500ms. More details here
tevc -e experimentname now disk0 create command="disk2 0 10000 delay /dev/sdb 0 500"
Note: We need to use 'disktype' here unlike in the NS syntax and the usage of quotes differ.
We cannot change the size of the disk once its created.
Known Issues: The flakey target will fail I/O's depending on the parameters and cause the file system to go read-only. If the percentage of I/O's failing increases then you can expect the file system to go read-only sooner (which is obvious). Some file systems could generate I/O's as a part of journalling and that could fail as well.
Event Sequence
set doit [$ns event-sequence { $disk0 run $disk0 run -type "flakey" -parameters "10 1" $disk1 start }]
Example 1: A sequence that makes disk0 and then changes it to be flaky and then starts disk1.
Event Timeline
set tl [$ns event-timeline] $tl at 0s "$disk0 run" $tl at 20 "$disk0 run -type \"flakey\" -parameters \"100 1\"" set seq [$ns event-sequence { $tl run }]
Example 1: A timeline that creates disk0 and 20s later makes it flaky [which is up for 100secs and down (returns I/O errors) for 1sec].