ShareEmail this to someoneShare on RedditTweet about this on TwitterShare on FacebookShare on Google+Share on LinkedIn

Workflows are sets of steps and connections between them that defines the algorithm – what it should be executed and how. It allows library to have great flexibility so it can implement different genetic algorithms that have different structures in a uniform way. Workflows also allow changes to be performed on algorithms in run-time.

Workflows are hierarchically organized in two levels. The top-level flow organizes parallel execution of the workflow, while the second-level flow contains steps that represent actual operation that should be executed by the algorithm. Connections between steps at the top-level flow are called branch groups. Branch groups have exactly on second-level flow and can have arbitrary number of branches that execute the its flow. While each branch in the group must execute the same second-level flow, it is not required that they execute it in the same way. Every branch gets its own thread from the thread pool when it needs to be executed.

Workflow example for coevolution GA

Each step is top-level flow is some form of barrier which joins inbound branch groups and/or fork new ones. Depending on the type of barrier it can join and fork arbitrary number branch groups. When barrier forks new branch groups, it starts required number of branches by assigning them threads from thread pool and sets the first step that it needs to execute, from there each branch can execute the flow in the different way. Branch finishes execution when there are no more steps to execute. Brunch group becomes ready for joining when all of its branches finish execution. Barrier that joins branch groups will wait for all inbound branches to become ready for joining before it forks outbound branch groups. These barriers should not be confused with threading barriers (represented by GaBarrier class in Common::Threading namespace).

Second-level flows, executed by the branches, are set of steps that can execute genetic operators, custom methods/functions or make decisions. Depending on its type, step can have no, one or several outbound connections to other steps of the flow, but each branch can choose one connection at most that it will follow. If it does not choose outbound connection there will be no next step to execute and that branch will finish its execution. Decision making steps can have several outbound steps for different outcomes of the decision, but each branch must choose only one possible outbound connection.

Workflows also has the third, special, kind of connections, that are used when one branch group needs to yield control to another group directly without executing a barrier step. These types of connections are called branch group transitions. And they perform checks to ensure that transition is valid and it will not cause ill effects such as deadlocks. These connections usually are used when implementation loops and decision making in the algorithms.

Example of branch group transitions (blues are valid and reds are invalid transitions)

Workflows offer option for storing data in dictionary-like structures called data storage. Each data object have to have ID which is used for retrieving data during the execution of the workflow. Data storages are organized hierarchically and each has its scope. Scope is defined by the level of the storage in hierarchy. Data IDs must be unique on each level, but storages at different levels or those with different scopes can have entries with same IDs. The scope and hierarchy of data storages is defined like this (lower number indicates lower order in hierarchy):

  1. branch level storage – data only available to the current branch
  2. branch group level storage – data available to all branches of the branch group
  3. workflow level storage – data available to all branches of the groups that belongs to the current workflow
  4. global level storage – data available to all branches in all workflows

Getting data from the storage can be done in two ways. The first way is to query single storage object at desired level for specified data ID. The other way is for user to specify range of storage object that should be searched and library will go from lowest level to highest level storage in search for data with specified ID.

Example of branch group transitions (blues are valid and reds are invalid)

Library uses reference counting to track use of data from storages so the user does not have to worry about memory management. Data will not be destroyed even after they are removed from the storage until all branches that are using them release references they obtained.

Implementation of workflows is located in Common::Workflows namespace. Three classes in the namespace: GaFlow, GaFlowStep and GaFlowConnection represent base for implementation of flows, flow steps and connections between the steps.

GaFlow has methods for adding and removing steps and connections between them: AddStep, RemoveStep, ConnectSteps and RemoveConnection. Step can only belong to a single flow at a time.

GaFlowStep class is base for flow steps. It stores inbound and outbound connections and exposes interface for flows to control ownership of the step and manage inbound and outbound connections as well the interface for the execution by the branches.

operator () is abstract and it must be overridden and should implementation actual logic of the step that will be executed. Enter can be overridden to prepare branch for the execution of the step or to determine whether the step should be executed by the branch. If method returns true, branch will proceed and execute the step, otherwise it will skip it and go to the next one. Exit is executed after the main part of the step is finished, but before branch starts execution of the next step. This method can be overridden to implement synchronization, clean-up or similar operations. GetNextStep method returns the next step that branch should execute. If it returns NULL branch will finish the execution. This method must be overridden.

BindToFlow and UnbindFromFlow are called when step is inserted into a flow or removed from it. FlowUpdated method is called by the library when changes on the owning flow occurs. So far it is only called when number of branches in the group that owns the flow is changed. AttachNextStep, AttachPreviousStep, DetachNextStep, DetachPreviousStep methods are straightforward, they handle adding and removing inbound and outbound connections. These methods must be overridden. GaBasicStep class inherits GaFlowStep and overrides AttachPreviousStep and DetachPreviousStep so the steps can handle multiple inbound connections which should be true for all step types in the library.

Connections between steps are represented by GaFlowConnection as the base class. Each connection can have exactly one inbound and one outbound step. ConnectSteps, ConnectOutboundStep and ConnectInboundStep methods handle attaching connections to the steps. and DisconnectSteps, DisconnectOutboundStep and DisconnectInboundStep handle detaching.

Connections also have IDs which helps steps to identify inbound connection which brought the branch or to select outbound step which should be executed next by the branch. SetConnectionID and GetConnectionID methods expose connection ID.

For more details see documentation of following classes:

Branches are represented by GaBranch class. StartBranch method is called by branch group when it is forked and it starts execution of the branch from specified step. SplitWork methods calculates workload that should be distributed between branches that execute current step.

Each branch has ID which is unique in the group. It is a number sequentially assigned starting from 0. If step has branch filter, filtered ID should be used for work distribution. Branch filtering is described later in the article. These two IDs are exposed by GetBranchID and GetFilteredID methods. Branch filter of current step is available through GetCurrentFilter method.

Branch also stores last decision made by the previously executed step and it can be used for choosing correct outbound connection to the next step. SetLastDecision and GetLastDecision methods allow storing and retrieving decision.

GetData method returns local data storage object for the branch. Data storages are discussed later in this article.

GaBranchGroup class represent branch groups. StartBranches method is called when the group is forked and it starts all the branches from the specified step. SetLastStep method sets additional step that all branches should execute after they finish execution of the flow. The last step is a barrier step that should join branch group. Only the first branch that executes this method, after the group was forked, has effect. All subsequent calls of this method are ignored. ExecuteBranchLastStep is called by the branch to actually execute that step. Controlling number of branches in group is possible by the SetBranchCount method. This method should be called when workflow is not running. Group also has threading barrier that can be used for synchronization of all branches in the group. GetBarrier method returns reference to the barrier. These threading barriers should be used carefully since they expect all branches in the group to hit it and it can cause deadlocks if features like branch filtering are used.

CheckBranchGroupCompatibility method checks whether the branch transition is possible to targeted branch group without causing deadlock and similar concurrency problems.

Branch groups also have their data storage object available by calling GetData method. This storage object can be accessed from the all branches in the group.

Flow that branches in the group should execute is implemented by GaBranchGroupFlow class and it is available through GetBranchGroupFlow method. SetFirstStep sets the first step that will be executed by the branches when barrier forks branch group. GaBrachGroupFlowConnection class represents connections in the flow. These connections can be used for connecting steps in the same flow. CheckConnectionValidity method implements this check.

For more details see documentation of following classes:

GaBranchFilterInfo and GaBranchFilter classes implements branch filtering. Filtering allows user to specify which branches are allowed to execute certain steps. If branch is filtered it will skip execution of step. Filter can be activated or deactivated completely by using Activate and Deactivate methods. IsActive method indicates whether the filter is active.

GetFilteredID method provides branch ID that is unique among allowed branches in the group and is assigned sequentially. ClearBranchMask, SetBranchMask, ClearAll and SetAll methods controls which branches should be filtered and which should be allowed. GetBranchMask returns whether the branch with specified ID is allowed. CanExecute method also queries the state of the filter, whether it is active or not, and branch mask to see if the specified branch should pass the filtering. SetSize is called by the library each time the number of branches in the group is changed.

GaBranchFilter extends GaBranchFilterInfo class and provides threading barrier which expects only those branches that are allowed by the filter to hit it.

For more details see documentation of following classes:

Following type of steps are implemented by the library:

  • GaSimpleWorkStep (more) – base class for standard work steps that support only one out bound connection.
  • GaNopStep (more) – class represents step that does not perform any actions.
  • GaBinaryDecision (more) – class represents step that makes decision about the further execution of flow. It has two possible outcomes. Decision must be overridden to implement decision logic. If it returns true branch fill go to step connected with connection that has ID set to 1, otherwise it will use connection with ID 0 to get the next step.
  • GaDecision (more) – class represents step that makes decision with multiple outcomes. Decision must be overridden to implement decision logic. Return value represent ID of the outbound connection that should be used to get the next step.
  • GaSimpleMethodExecStep (more) – class represents step that executes method of specified object stored in data storage. This method should be used for methods that expect pointer to the current branch as a parameter. GaMethodExecPassBranch should be set as type for METHOD_EXEC template parameter of GaSimpleMethodExecStep class, otherwise GaMethodExecIgnoreBranch should be used. SetMethod changes method that should be executed and SetObjectID method changes object on which the method should be performed.
  • GaFilteredStep (more) – base class for steps that supports branch filtering. Branch filter is exposed through SetBranchFilterInfo and GetBranchFilter methods. The class also offers option to synchronize all branches on exit of the step using threading barrier. This can be controlled by SetSyncOnExit methods.
  • GaOperationStep1 (more) – class represents step that performs operations (Common::GaOperation) on a single object from the storage that acts as input and stores output of the operation if it is required. SetSetup method sets operation’s setup that should be executed. SetIOData sets data from the storage that will be used by the operation.
  • GaOperationStep2 (more) – class is similar to GaOperationStep1 but it expects separate objects from the storage for input and output data. SetInputData and SetOutputData method allows user to set input and output data objects for the operation.
  • GaAbstractBarrier (more) – base class for all barrier-like flow steps in the library. It implements joining of multiple inbound branch groups before performing an operation which is left to implementation to classes that inherits it.
  • GaBranchGroupTransition (more) – class represent special kind of barrier that allows direct transition to from current branch group to another one. Even though this is barrier step it must be placed in branch group flow, not top-level group flow. When this step is hit by a branch, it will wait for all branches of the current group to finish execution and then it will start targeted branch group. GaBranchGroupTransitionConnection class implements connection between transition step and arbitrary step in target branch from which the branches in target group should start execution of the flow. When connection is made, check from branch group compatibility is performed.
  • GaWorkflowBarrier (more) – class is implementation of real workflow barrier. It can have multiple inbound branch groups that should be joined and multiple outbound groups that should be forked.
  • GaStartStep (more) – class represents the first step of the workflow. It is a barrier that can fork arbitrary number branch groups, but since it is the first step it cannot accept inbound connections so it cannot be used for joining branch groups.
  • GaFinishStep (more) – class represents the last step of the workflow. It is a barrier that can join arbitrary number branch groups, but since it is the last step it cannot have outbound connections so it cannot be used for forking branch groups.

Top-level flow, workflow, is implemented by GaWorkflow class. When new workflow is create it has only GaStartStep and GaFinishStep. User can build workflow by using AddStep and ConnectSteps methods. Since GaStartStep and GaFinishStep are connected after creation of workflow, they should be disconnected using RemoveConnection method. Workflow only accepts barriers for its steps and branch groups as connections between them.

Execution of the workflow can be controlled using these methods: Start, Resume, Pause and Stop. Workflow can be in one of the following states represented by GaWorkflowState enumeration:

  1. GAWS_STOPPED – workflow is stopped. It is safe to perform changes on the workflow.
  2. GAWS_PAUSED – workflow is paused. Even though it is not running it might not be safe to perform changes on the workflow
  3. GAWS_RUNNING – workflow is running. It is not safe to perform changes on the workflow.
  4. GAWS_NOT_RUNNING – combination of GAWS_STOPPED and GAWS_PAUSED. It indicates that there are no active branches that executes the workflow steps.

States of the workflow and transitions between them

Current state of the workflow can be obtained by calling GetState method. CheckWorkflowState method is used by branches to check current state and see if they should continue execution. To synchronize external code with the workflow user can use Wait method which will black calling thread until the stopped state is reached, either user called Stop method or workflow reached GaFinishStep.
Workflow exposes its storage through GetWorkflowData method. Global data storage is available through GetGlobalData method.

For more details see documentation of following classes and enumerations:

GaDataStorage class manages data for the workflows. GaDataStorageLevel defines possible levels of storage objects:

  1. GADSL_GLOBAL – data storage is available in all workflows
  2. GADSL_WORKFLOW – data storage is available only in current workflow
  3. GADSL_BRANCH_GROUP – data storage is available only in current branch group
  4. GADSL_BRANCH – data storage is available only in current branch

Currently embedding workflows is not supported so GADSL_GLOBAL and GADSL_WORKFLOW are effectively the same. This might change in the future versions of the library.

GaDataEntryBase class represent entry in data storage. It stores ID of the entry and it is responsible for tracking reference count to the data, since entry claims ownership of data. Actual data of the entry are managed by GaDataEntry class.

Retrieving data from data storage can be done using FindData and GetData methods. FindData goes through storage hierarchy, starting from lowest level specified and tries to find data with specified ID. GetData only tries to retrieve data at specified level and if it fails it will not look in storages at higher level. There are several versions of FindData and GetData methods available in GaDataStorage class. AddData and RemoveData methods do as their names suggest – adding data and removing them from a storage. All these methods are thread-safe.

To achieve reference tracking of data objects, user should use GaDataCache class. When data entry is obtained using FindData or GetData method, pointer to the entry should be sored in new instance of GaDataCache class that is responsible for incrementing and decrementing reference count when it goes out of scope.

Another feature that data storage supports is data binding implemented by GaDataBinder class. When data binding is used, library will update data in destination storage each time the data with specified ID are changed in the source storage. User needs to specify source data storage and ID of the source data on one side of the binder and destination storage and ID under which the data will be available on the other, as well as the function that defines how the destination data should be updated.

For more details see documentation of following classes and enumerations:

To make parallel execution easier, library has two classes that are used for splitting workload and distributing it among active branches. GaParallelExec and GaParallelExecT are base classes for work distributors, namely GaParallelExec1 and GaParallelExec2. The first one splits array of items and assigns each branch equal number of items to process. The second one makes pairs of items and assigns each branch an equal number of pairs to process. Branch should call Exec and provide operation that should be performed on the (pairs of) items, this method also can synchronize branches, using threading barrier. Exec method can be called multiple times using the same instance of work distributor, but if the underlying array can change size between calls, Update method should be called before executing Exec method.

Example of GaParallelExec1 work distribution

Example of GaParallelExec2 work distribution

Processing can be limited to certain range of the array by using different item providers. Providers are specified by setting template parameter of work distributors. GetCount method of item provider returns number of items in the range that should be processed and GetStart method returns index of the first item that should be processed. operator () returns item at specified index in underlying array. GetProvider method of GaParallelExecT class returns item provider that is used by the work distributor.

For more details see documentation of following classes:

There are three different item providers in the library that are available out of the box:

    GaDefaultItemProvider (more) – represents provider that returns items from the whole array.
    GaLimitedItemProvider1 (more) – represents provider that returns items from the specified index, up until the end of the array.
    GaLimitedItemProvider2 (more) – represents provider that returns items in arbitrary arrange. User has to specify start and the end position in the array.

You must be logged in to leave a reply.