Integration of OMF Control with PlanetLab testbeds

1. Goal:

Allow PlanetLab (PLC/PLE) users to control their experiments on PlanetLab testbeds with the Control tools provided by OMF.

In other words, allow a PLC/PLE user to use OMF's Experiment Controller (EC) to execute experiments, which involve resources either on PlanetLab or/and on an OMF-enabled testbed.

Please read our OMF Introduction Page or refer to the Glossary for more information about the EC.

2. Simple Scenario Overview:

  • 1 - the user request a slice and add resources (e.g. nodes) to it using the current PlanetLab procedure (e.g. via web interface or use of SFA). During that step the user explicitly mention that she/he wants to use OMF to control her/his experiment.
  • 3 - the user submit the experiment description (ED) to the experiment controller (EC) running on his/her local machine
  • 4 - the EC interprets the ED, and sends corresponding commands to the resource controllers (RC) running on each nodes involved in the experiment
  • examples of such commands are: 'start application X', 'configure device Z with parameter Y', etc...
  • 5 - a resource controller (RC) runs on each node (i.e. on each sliver) involved in the experiment. It executes any commands from the EC and sends back to it any replies/outputs.

3. Scope:

  • Version 1 of this development will only focus on steps 2 to 5 of the above. It assumes that prior to using the EC, the user has already a slice and assigned resources to it using the standard PlanetLab procedures. Deadline for a working demo of version 1 is: 15 March 2010.
  • Version 2 will add support for the above step 1. In other words, based on the user's experiment description, the EC will interface with PlanetLab mechanisms to request slices and assign resources to them on behalf of the user. Then the EC will execute the experiment as in the previous version.
  • The remaining of this page describes design and implementation for version 1.

4. Design:

4.1. Communications:

Communications between the EC and the RCs is done via a Publish-and-Subscribe system (PubSub), i.e. entities post messages to specific a board/group and receive messages for the chat/board/group they have subscribed to. This is illustrated in Fig. 1 below, and it is how OMF communication are currently done.


Figure 1: Communication between EC and RCs

Let us considere a simple experiment similar to the Hello World scenario, where 2 nodes (A="sender" & B="receiver") exchange some traffic. When such an experiment is executed by a user assigned to a Slice "Foo" (i.e. let's called it "Experiment 1"), the organisation of the PubSub group/board is as described in Fig. 2 below:


Figure 2: PubSub group/board organisation

4.1.1. System (or testbed) Bootstrap & Communication

  • Grey-coloured groups in Fig.2
  • The PubSub server is started as a daemon on a publicly accessible machine, e.g. by one of the machine's startup scripts
  • This startup script also need to create the "OMF" and "system" groups/boards within the PubSub server
  • Then for each available resource X on the testbed, this startup script also has to create a group/board with the same name as the resource, under the "system" group
    • e.g. if the resource's HRN is "onelab10.inria.fr", then the PubSub group for that resource should also be named "onelab10.inria.fr"
  • The Aggregate Manager(s) (AMs) responsible for that specific resource should then subscribe to its particular PubSub group, e.g. "/OMF/system/onelab10.inria.fr"
    • When we need to manage this resource, we send the required commands to that PubSub group
    • The AM for that resource will receive these commands and execute them (e.g. start/stop sliver, load/save image, ...)
    • For Example:
      • We want to start up a sliver on the resource X for a given slice Y
      • We send the relevant command to "/OMF/system/X"
      • The AM responsible creating slivers for X listens on "/OMF/system/X" and receives the command
      • It creates the sliver on resource X, start it, and instructs the Resource Controller of that sliver to subscribe to "/OMF/sliceY/resources/X"

4.1.2. Slice Bootstrap & Communication

  • Yellow-coloured groups in Fig.2
  • When a new slice X is created, a new PubSub group should also be added to the PubSub server with the same name
  • Another PubSub group named "resources" should be added under this "/OMF/sliceX" group
  • When a new resource Y is added to a existing slice X, a new PubSub group with the name of that resource should be added under "/OMF/sliceX/resources"
  • Subsequently, when a sliver is instantiated on that resource Y for that slice X, the Resource Controller running on that sliver should subscribe to the PubSub group "OMF/sliceX/resources/Y"
  • Simplification for Version 1:
  • As mentioned in "3. Scope" above, for the 1st version of this integration, it is assumed that prior to using the EC to run an experiment, the user has already successfully requested a slice and assigned resources to it, using the standard PlanetLab procedures
  • Therefore, for version 1, the above System and Slice bootstrap could be simplified as follows:
    • The PuBSub server and grey-coloured group (e.g. "OMF", "system", ...) initialisation are done as above
    • When a user create a slice and assign resources to it, the PlanetLab tools must be extended to do the following:
      • create the relevant PubSub groups: "sliceX", "resources", "NODE Y", etc...
      • install and start a Resource Controller in each sliver corresponding to these resources
      • instruct the Resource Controller of sliver Y to subscribe to the PubSub group "/OMF/sliceX/resources/NODE Y"
  • For version 2 of this integration, we will extend the OMF Experiment Controller to directly perform the above Slice bootstrap (i.e. via the use of SFA)

4.1.3. Experiment Bootstrap & Communication

  • Blue-coloured groups in Fig.2
  • In version 1, this is when the user is launching the Experiment Controller (EC), and gives it an Experiment Description (ED) to execute
  • The EC creates a PubSub group for the Experiment, e.g. "/OMF/sliceX/experiment1"
  • The EC parses the ED, and for each set of resources in the ED:
    • It creates a corresponding PubSub group under "/OMF/sliceX/experiment1"
    • It instructs all the resources belonging to that set of resources to subscribe to this new PubSub group
    • For example:
      • If the ED describes a set "Receivers" which includes node M and N
      • the EC creates the PubSub group "/OMF/sliceX/experiment1/Receivers"
      • the EC publishes a message to the PubSub groups "/OMF/sliceX/resources/NODE M" and "/OMF/sliceX/resources/NODE N", requesting subscription to the "Receivers" group
      • the RC running on the sliver M and subscribed to "/OMF/sliceX/resources/NODE M" receives the message, and obeys by subscribing to "/OMF/sliceX/experiment1/Receivers"
      • Same is done by the RC running on the sliver N
  • For each resources Y used by the experiment, the EC will subscribe to the corresponding "/OMF/sliceX/resources/NODE Y" PubSub group
    • The RC running on the sliver Y will send reply messages to the the EC by publishing then to the PubSub group "/OMF/sliceX/resources/NODE Y"
  • Following the experiment description, the EC then sends subsequent commands corresponding to the experiment tasks directly to the relevant PubSub groups under "/OMF/sliceX/experiment1/"

Communication Summary

Figure 3 summarises the above points:

Figure 3: PubSub group/board organisation (detailed)

4.2. Communication Security

  • Authentication and message encryption between PubSub clients (e.g. EC, RCs, AMs, etc..) and the PubSub server are already handled by the adopted PubSub solution, i.e. XMPP with extension XEP-0060. The current implementations used by OMF are XMPP4R on the client side, and OpenFire on the server side, which both supports these features.
  • The remaining issue is the authentication between an EC and the RCs associated to the slice and experiment that the EC is in charge of running
  • In the PlanetLab context, a set of public/private keys is associated to each user. The user uses these keys to authenticate to the system in order to request/create slices and add resources to them. When a user add some resources to one of his/her slices, his/her public keys is passed to the corresponding sliver, to allow him/her to later access this resource.
  • For the EC/RCs authentication, we propose the following:
    • The user passes his/her keys to the EC when starting it
    • The initial message that the EC sends to a RC on a given sliver (i.e. share of a resource) will include this public key, which is signed by an authority
    • The RC verifies that this authority is within its trusted chain of authorities, and if so verifies with that authority that the user is actually allowed to use this resource for this given slice and duration
    • If so the RC records that public key, and replies a positive response to the EC
    • Subsequent messages from the EC to the RC are then signed by the EC with its private key

5. Implementation

  • All slice and resource names will follow the PlanetLab HRN (Human Readable Name) convention

5.1. PlanetLab Tools (INRIA)

  • On the PlanetLab tool interfaces, an option is required for user to have OMF control "enabled" for his/her slice
  • If "enabled", when a slice is created the PlanetLab tools have to:
    • contact/authenticate with the PubSub server
    • create the new PubSub groups "/OMF/sliceX" and "/OMF/sliceX/resources"
  • If "enabled", when a new resource Y is added to a slice, the PlanetLab tools have to:
    • contact/authenticate with the PubSub server
    • create a new PubSub group for that resource, i.e. "/OMF/sliceX/resources/Y"
    • install the RC on the image for the sliver on that resource (if not already installed by default)
    • start the RC on that sliver, and provide it with the following parameter:
      • contact information of the PubSub server (hostname)
      • name of the Slice to which this resource is associated
      • name of this resource (its HRN)
    • ensure that the above is done if the sliver is rebooted
    • using these parameter inputs, the RC will subscribe to the initial PubSub group: "/OMF/sliceX/resources/Y"
  • As some PlanetLab tools currently use XML-RPC to communicate between entities, for minimal changes of their code, we propose to have an XML-RPC interface to the PubSub server. Thus, to create PubSub groups, the PlanetLab tools will issue XML-RPC messages to this interface, which will in turns issue the relevant command to the PubSub server.

5.2. XML-RPC Interface to PubSub Server (NICTA)

  • As mentioned above, this interface will accept XML-RPC messages and will issue corresponding commands to the PubSub server
  • The valid messages and corresponding commands should allow:
    • the authentication of the caller as a valid PubSub user (i.e. registered user on the PubSub server side)
    • creation of PubSub group once authenticated
    • removal of PubSub group once authenticated and if the group is owned by this user

5.3. Experiment Controller (NICTA)

  • The EC needs to support new naming scheme for resources
    • Old naming:
      • based on a [X,Y] coordinate couple, which was mapped to an IP address via an Inventory
      • this IP was in turn used as the name of the PubSub group, which was the initial contact point to the resource
      • user discovers available resource via a online testbed "map", or implicitly (e.g. resource [2,1] must exist if the testbed is said to have Xmax = Ymax =3)
    • New naming:
      • arbitrary name following the HRN convention of PlanetLab
      • no implicit association between name and coordinate on a map (but info still available if required via an inventory service)
    • In the ED, the resources should be referred to by their HRN
  • At startup, the EC needs to access slice information (HRN) and user's public/private key
    • Could be passed as command line arguments or as default parameter in the EC's config file
  • The EC needs to support the new PubSub communication tree as described above
    • creation of required new PubSub group when parsing the ED and finding new set of resources
    • publishing request to the corresponding resources to have them subscribe to these new groups
    • subscribing to the relevant PubSub group according to the ED
    • modification of the current EC communication module to fit the new convention "/OMF/sliceX/", etc...
  • Currently the EC interacts directly with the PubSub server, using XMPP messages via the XMPP4R library
    • we will keep this approach for this integration
    • we might later discuss the benefit of having (or not) an interface between EC/RC and the PubSub server for group creation/removal

Additional EC changes that we agreed to do here, but which are not required for a functional version 1 of this integration:

  • Support new Resource and Topology Description scheme in OEDL
  • Replace current "service calls" to AM services by an approach where:
    • AM services will subscribe to PubSub groups of their resources under the "/OMF/system" branch
    • EC makes new "service calls" by publishing requests on these group

5.4. Resource Controller (NICTA)

  • The RC should support the new naming scheme as described above in the "5.3. Experiment Controller" section
    • Old scheme:
      • RC derives its own [X,Y] by retrieving its IP address on its control interface, and following the implicit convention that the last two parts of this IP corresponds to X and Y
    • New scheme:
      • The entity that starts the RC passes the resource's HRN name to it
      • This could be done via boot option of the sliver's virtual machine (as it is currently done to assign control interface)
      • In version 1, this would be the PlanetLab tools
      • In version 2, this would be the AM listening on "/OMF/system/ResourceY" and responsible for power-up/shutdown the resource
  • The RC needs to support the new PubSub communication tree as described above
    • modification of the current RC communication module to fit the new convention "/OMF/sliceX/", etc...

Additional RC changes: not required for a version 1 on PLanetLab, but necessary for version 2 and current NICTA testbed

  • RC should be split in Resource Manager & Resource Controller (as mentioned in our early federation/virtualisation plans)
  • When a request to create a sliver arrives on "/OMF/system/NodeY"
    • AM processes it:
      • if no RM is up for that resource, then power on the resources and its RM
        • New RM gets its HRN from the AM, and subscribes to the corresponding "/OMF/system/NodeY"
        • AM re-publish the sliver creation message on "/OMF/system/NodeX"
      • if the resources is already powered-up and an RM is already started, then do nothing
    • RM processes it:
      • RM start a new virtual machine or sliver
      • RM starts a new RC in that sliver and passes the HRN and PubSub node detailed to it ("/OMF/sliceX/resources/NodeY")
      • The New RC starts and subscribes to "/OMF/sliceX/resources/NodeY"
    • This is the bootstrap sequence for v2 and also for v1 for a NICTA testbed, thus it produces the same result as the section 5.1 scheme for a PlanetLab testbed

5.5. Aggregate Manager (NICTA)

  • All the AM services should be modified to support the new "service calls" replacement scheme
    • AM services will subscribe to PubSub groups of their resources under the "/OMF/system" branch
    • EC makes new "service calls" by publishing requests on these group
  • CM service:
    • CM has to subscribe to "/OMF/system/NodeX" for all the node it is responsible for
    • When receiving a command, CM has to authenticate and check that caller has the right to execute that command
    • If ok, then CM execute the command (e.g. reboot, power on/off node)
    • NOTE for version 1:
      • This does not concern the PlanetLab resources in this version, as their sliver creation/deletion are handled by the PlanetLab tools
      • In version 2, we may have a CM service, which will create slivers, and start/configure the RCs on them
  • Services Frisbee, PXE, Inventory, and Result:
    • These services are not acting on PlanetLab resources
    • Same approach needs to be implemented

OMF-Communication.png (44.1 kB) Thierry Rakotoarivelo, 01/13/2010 04:48 pm

1-NICTA-Design-1.pdf (271.5 kB) Thierry Rakotoarivelo, 01/13/2010 05:33 pm

1-NICTA-Design-2.pdf (174.5 kB) Thierry Rakotoarivelo, 01/13/2010 05:33 pm

OMF-Communication-2.png (80.5 kB) Thierry Rakotoarivelo, 01/15/2010 02:27 pm

OMF-Communication-2B.png (123.3 kB) Thierry Rakotoarivelo, 01/15/2010 02:27 pm