Design Document 1 for OMF 5.3.X - 5.4¶
Note: this aims to summarise the discussions in our meeting of 01/09/10
Feature:
- OMF Support for resources that do not have a CMC
- OMF Support for federated experiments, i.e. experiment with resources from different aggregates
Assigned to Christoph, Jolyon, and Thierry - Duration 2 weeks
White Board:
General Comments:¶
- Resources will have a definite Life-Cycle:
- Life-cycle steps are: create, initialise, configure, start, stop, release
- configure should be able to happen as well anytime between the original start / stop
- All these steps should be implemented in OMF, although initially some of them might not do anyting
- for example, start might not do anything in a 5.3.X version
- A full state diagram will soon be attached to this page
- The states: create and delete will be part of the resource discovery/association-to-slice phase
- This phase happens prior to the user running her experiment with the EC
- This phase is done using tools to interact with a Slice Manager. PlanetLab/ProtoGENI have a similar phase, where the users use the SFA tool to interact with their Slice Manager
AM Comments:¶
- We will have a new AM service: Slice Manager
- Slice Manager AM will accept the following requests and perform the following tasks for now:
- Create_Slice:
- Inputs: "slice name" and "PubSub server hosting the slice"
- Tasks: create the pubsub groups
OMF/sliceXYZandOMF/sliceXYZ/resourceson the given PubSub server
- Associate_Resource_to_Slice:
- Inputs: "slice name" and "resource list" and "PubSub server hosting the slice"
- Tasks: create the PubSub group
OMF/sliceXYZ/resources/resourceXfor each resource in the list, and send a request to the AM services responsible for the resources in the list to associate with the sliceXYZ on the given PubSub server - Note: in PlanetLab/ProtoGENI, a RSPEC (Resource Specification) is used instead of the "resource name"
- the RSPEC allows the definition of a list of resources & some initial configuration parameters for these resources
- our initial implementation will simply use "resource list", however it should be developed so that replacing this by an RSPEC-like form would require minimal changes
- Delete_Slice:
- Inputs: "slice name" and "PubSub server hosting the slice"
- Tasks: request all the AMs subscribed to this slice to leave it (i.e. send a broadcast "Deassociate_from_Slice" to
OMF/sliceXYZ/resources, and delete the pubsub groupsOMF/sliceXYZandOMF/sliceXYZ/resourceson the given PubSub server
- Create_Slice:
- Note: The Slice Manager AM will need to keep records of the current active slices if it ever gets stopped and restarted
- PXE, Frisbee, CMC, Iventory and Result AMs will need to subscribe to Slice-related PubSub groups:
- all of them are already subscribing to the System part of the PubSub tree in their own aggregate's PubSub server
- they now should accept the following request (most likely issued by a Slice Manager AM):
- Associate_to_Resource_At_Slice:
- Inputs: "slice name" and "resource list" and "PubSub server hosting the slice"
- Tasks: subscribe to the PubSub groups
@OMF/sliceXYZ/resourcesandOMF/sliceXYZ/resources/resourceXfor each resource in the list at the given PubSub server
- Associate_to_Resource_At_Slice:
- and also (most likely issued by a Slice Manager AM on the
OMF/sliceXYZ/resources):- Deassociate_from_Slice:
- Inputs: "slice name" and "PubSub server hosting the slice"
- Tasks: unsubscribe to all previously subscribed PubSub groups on and under
OMF/sliceXYZ/resourcesat the given PubSub server
- Deassociate_from_Slice:
- Note: These AMs need to be able to re-subscribe to their previously subscribed PubSub groups in case of stop/restart
- Design issues:
- Who creates the PubSub group
/OMF/sliceXYZ/resources/resourceA?- Either the Slice Manager AM (on a Associate_Resource_to_Slice) request or the other AMs (on a Associate_to_Resource_At_Slice)
- if other AMs, then we have a potential issue when deleting the slice's PubSub groups as, only the group creator can delete it, thus we would have to keep track of which AM created the group
- if Slice Manager AM, only this AM has to keep track of which PubSub group it created
- solution: Slice Manager AM?
- How to implement "memory across stop/restart" of which PubSub groups an AM has subscribed to?
- Hard state kept at the AM itself? (e.g. CMC AM keeps record of subscribed PubSub group and resubscribed if it ever restart)
- Soft state with periodic refresh/update message from Service Manager AM (e.g. CMC AM receives periodic list of PubSub groups it needs to subscribe to from the Service Manager AM)
- Another alternative?
- Who creates the PubSub group
EC Comments:¶
- We will have a dynamic resource discovery by the EC, with also the implementation of the resource Life-Cycle described above
- initialise phase:
- At the start of the experiment run, the EC will query the PubSub server for all the existing groups under
/OMF/sliceXYZ/resources, each of these groups will be considered as a resources that has been associated to the slice - The EC will sent a CMC ON request to all the
/OMF/sliceXYZ/resources/resourceXgroups. The EC should not wait for a reply on these calls. - Note: This scheme now replaces the previous OMF 5.2 and 5.3 schemes where the EC was making an direct or indirect (via the CMC service) Inventory call to find out the list of resources.
- At the start of the experiment run, the EC will query the PubSub server for all the existing groups under
- configure phase:
- Now the EC sends a "configure" request to all the resources (i.e. the
/OMF/sliceXYZ/resources/resourceXgroups) to request them to associate with the experiment Exp123 - EC should wait for response from all the resources to that request. The waiting time is set in the EC's stdlib
- Upon timeout of waiting for the response:
- if a number of retries has been defined, the EC should retry:
- re-initialise the resource (e.g. CMC ON request as in previous phase), resend the "configure", and re-wait for timeout. The retry number is set in EC's stdlib
- if a number of retries has been defined, the EC should retry:
- The EC proceeds with the experiments when:
- all the resources have responded positively to the configure request,
- or when all the timeout/retries have been executed and there are some resources which responded positively
- (unless the experimenter has defined a hard number of resources as a limit for the experiment to proceed - needs to be implemented in OEDL)
- Note: this replaces the previous ENROLL scheme that we had in OMF 5.2 and 5.3
- Now the EC sends a "configure" request to all the resources (i.e. the
- start and _stop phases:
- For now, the EC does nothing there, but it should still be implemented so some tasks could be easily added there
- Note: the current EC implements a hard stop on each resources, i.e. the current RESET send to all at the end of the experiment
- this is not the same stop mentioned here, which may happen multiple time in conjunction with start during an experiment runtime
Task assignements:¶
- The development of the AM-related components (i.e. "AM Comments" above) is assigned to Jolyon (issue #388 & #389)
- The development of the EC initialise phase (i.e. "EC Comments" above) is assigned to Christoph (issue #387)
- The development of the EC configure phase (i.e. "EC Comments" above) is assigned to Thierry (issue #386)