_{1}

^{*}

Autonomous Navigation Modules are capable of driving a robotic platform without human direct participation. It is usual to have more than one Autonomous Navigation Modules in the same work space. When an emergency situation occurs, these modules should achieve a desired formation in order to efficiently escape and avoid motion deadlock. We address the collaboration problem between two agents such as Autonomous Navigation Modules. A new approach for team collaborative control based on the incentive Stackelberg game theory is presented. The procedure to find incentive matrices is provided for the case of geometric trajectory planning and following. A collaborative robotic architecture based on this approach is proposed. Simulation results performed with two virtual robotic platforms show the efficiency of this approach.

When multiple autonomous robotic platforms are involved in common task, the team coordination is usually an important control aspect. Robots in the team make their own decisions, which may be in conflict with other teammate’s decisions [1,2]. In particular, the coordination problem must be efficiently handled when two Autonomous Navigation Platforms, which are sharing the same configuration space, need to achieve a team formation in order to avoid a motion dreadlocks during a task execution such as moving through a single way point, cleaning floor, transporting load or tracking a common target [

The motion deadlock avoidance problem has been studied for robot soccer game. The fuzzy logic approach was used to derive each robotic platform teammate action [

Among well known framework for studying robotic platform team collaboration is the game theory. When no natural decision making hierarchy can be established among teammates, Nash’s equilibrium approach is often used [

This paper studies two autonomous agents collaboration using the game theory framework approach. The agents are heterogeneous Autonomous Navigation Modules that are sharing a common workspace [

As stated before, the game theory offers a clear formulation for finding equilibrium point in situation where many decision makers (agents) are involved [

This paper provides two contributions. We proposed a method for finding the solution of an important class of discrete-time two-agent non zero-sum dynamic games with linear state dynamic and quadratic cost functional. Our solution is an extension of the solution proposed by [

The rest of the paper is organized as follows. The path planning problem in linear space and the methodology based on Stackelberg game theory are presented in Section 2. In Section 3, reactive robotic architecture is presented. The simulation results and the conclusion are presented in Section 4 and Section 5, respectively.

Consider a system with the configurations of two Autonomous Navigation Modules (ANM). Given the following state equation:

where:

is the current stage; is a state vector of the system, at stage; are control signals or strategies generated respectively by agent 1 (ANM 1) and agent 2 (ANM 2), at stage; it is assumed that both control signals have the same dimensions; where is the admissible set of strategies for agent; is a transition matrix of the system, at stage; are control matrices respectively for agent 1 and agent 2.

Given the following functional:

is the finit optimization horizon; Subscripts and with, represent different agents; is a vector of the reference trajectory that is followed by agentat stage; is a vector of the reference trajectory that is followed by agent at the end of optimization horizon; is a symmetric and positive semi-definite matrix that penalises the state vector and the reference vector deviation at stage; is a symmetric and positive definite matrix that penalises agent i control signal at stage within its functional; with is a symmetric and positive definite matrix that penalises agent j control signal at stage within the functional of agent i.

We consider only the case where the state vector is fully accessible by all agents and the initial state vector is completely known. Furthermore, we assume that agent 1 is the leader and agent 2 is the follower.

At stage, each agent selects its strategy such that the functional is minimized. In a state feedback Stackelberg game formulation, the leader wishes to influence the follower so that the selected strategies minimize its functional. Strategies of the leader and the follower that minimize the leader functional are considered by definition as the team optimal strategies. The state vector obtained by applying these strategies is represented by. To influence the follower strategy selection, we assume that the leader is using a linear incentive function represented by the following equation [

The problem is to find at each stage the two matrix gains and such that the state feedback control achieves a Stackelberg solution. In next sections of the paper, the terms control signal and strategy are considered equivalent.

In general, for two agents involved in a dynamic Stackelberg game, a solution concept is a pair of strategies from both agents that minimizes both functionals at each stage. This notion of equilibrium is extended to allow the definition of the feedback Stackelberg solution [

The strategy of the leader is chosen so that ,

The pair of strategies from Equations (6) and (7) is the Stalkelberg solution with agent 1 as the leader. We consider that the mapping is a linear function represented by Equation (5). In order to completely define this function, matrix gains and should be determined based on the optimal control theory.

Since the goal of the leader is to induce the follower to choose a strategy that minimizes its functional, we need to determine these two strategies called team optimal strategy. The team optimal strategies is defined as:

Assuming that the leader knows the follower functional. In order to incite the follower to adopt, the leader uses the strategy represented by Equation (5). The follower, in order to minimize its own functional and find its strategy, takes into account the previously mentioned leader strategy. This is a usual optimization problem from the follower side. Since the two matrix and are parts of the optimal strategy of the follower, the leader needs to provide them. This is not a simple optimization problem because the leader should take into account the expected rational strategy of the follower.

It is assumed that both agents minimize the leader functional represented by Equation (2). To find the pair of strategies that minimized, the optimal control theory is applied.

where stands for the arguments that allow the functional to attain its minimum value. The team Hamiltonian of the system is given by:

where:

Using the minimum principle, we obtain the following expressions:

and

and

Equation (12) becomes:

with the boundary condition:

From Equations (13) and (14) the following expressions are obtained:

The state Equation (1) becomes:

From the boundary condition (16), it seems reasonable to assume that for all:

where is a matrix and is a vector. By substituting in Equation (19) with it expression (Equation (20)), the following equation is obtained:

where:

and is an identity matrix with proper dimensions. From Equation (15), the following expression is obtained by substituting and with their expressions:

from Equation (23) is replaced by its expression and the following expression is obtained:

Since Equation (24) must hold for all X(n) given any X(0), we must have:

and

Rewriting Equations (25) and (26), the following expressions are obtained:

and

with the following boundary conditions:

Given the expressions of and, the equation of is completely determined. Hence, the team optimal strategies is represented by:

where:

is an identity matrix with proper dimensions.

Incentive Matrix Gains To incite the follower to adopt, the leader advertises its strategy represented by Equation (5). We assume that the leader has a full knowledge of the follower reference path. Hence, given:

The follower reaction is found by solving its Hamiltonian:

where:

Using the minimum principle, we obtain the following expressions:

and

Equation (40) becomes:

with the boundary condition:

where is the state sequence when and are applied to the system [

Assume that

where:, and are matrices with proper dimensions;

is the sequence of state vector when and are applied on the system. We know that:

Equation (44) can be rewritten as followed, given the expression of:

Substituting by its expression (46), the following equation is obtained:

If the follower acts exactly as the leader expected, is equal to and is equal to.

Hence, expression (46) becomes:

If the previous equation is true for any initial state, we must have the following conditions:

and

If both conditions hold, then the follower strategy is equivalent to:

To be able to compute the follower strategy, need to be evaluated. Consider the state equation when and are applied:

From Equation (45), the following expression is deduced:

Substituting Equation (57) in Equation (58) yields:

Substituting Equations (45), (46) and (57) in (42), we obtain the following equation:

Substituting in Equation (60) by its expression (59) yields:

This equation is true for any and if the following conditions hold:

1- for all

2- for all

3- for all constant values

Substituting expression of in Equation (62) yield:

The algorithm to solve the feedback Stackelberg game for trajectory following is summarized as followed:

backwardd processing:

1. Find all sequences of by using Equation (26);

2. Find all sequences of by using Equation (27); forward processing: At each step,

1. find by using Equation (33);

2. find by using Equation (34);

3. find by using Equation (35);

4. find by using Equation (36);

5. find from Equation (50);

6. find from Equation (65);

7. find from Equation (63);

8. find from Equation (51);

9. find from Equation (64);

The generic reactive architecture depicted on

are generated respectively by the leader reference path generator and the follower reference path generator.

Given the reference paths, the Stackelberg formulation can take place by considering that the leader reference path is the team reference trajectory. For the collaborative navigation application, the leader reference trajectory is the same as the follower trajectory unless an unknown obstacle is detected. As shown in

For each planned stage, the two signals generated by both agents are directly given to the position controller. This module is responsible for applying required low level control signal to the platform effectors so that its configuration tends to be as close as possible to the given configuration. The position controller causes the platform to change its configuration during a stage. The obtained platform configuration is used as feedback for the two Stackelberg solver in order to generate next stage control signals.

The described architecture fulfills the minimum requirements stated by Hoc [

To validate all required steps for collaborative control based upon the feedback Stackelberg theory, a simulation has been performed. The focus is put on the planning layer of the generic architecture presented in

Assume that the two agents are at point A. Their platforms are considered as a single team platform. The goal of the simulated collaborative navigation is to drive the team platform and to reach the point B, starting at point A, as shown in

stage. The team platform state is designated by and the simulation parameters are described bellow.

where:

where:

represents the system integration time step;

represents the leader control signal along xaxis;

represents the leader control signal along yaxis;

represents leader control signal related to orientation;

represents the follower control signal along xaxis;

represents the follower control signal along yaxis;

represents follower control signal related to the orientation.

The leader and follower functionals are represented by Equation (2). We assume that control signals are not bounded. Involved functional matrices are defined to be well dimensioned unit matrices except, , and which are set to. The optimization horizon is set to the whole simulation number of stages.

In

Figures 5-7 show agent control signals during the first phase. Since the reference paths are identical, the leader contribution along each axis is small meaning that the incentive part in the leader control signal is also small. This result makes sense since the follower is acting as wished by the leader.

The second phase starts from stage 701. During this phase, the leader needs to make use of the incentive strategy in order to induce the follower to track the leader reference path instead of its own reference path, as shown on Figures 8-10.

A new collaborative architecture is presented in this pa-

per. This architecture is based upon the incentive Stackelberg game formulation and the three-layer architecture. The proposed method is suitable to applications in which there is a hierarchy between decision makers. All required conditions, and equations have been provided in order to find incentives matrices and an algorithm for solving the Stackelberg problem for a class of discretetime two-agent non zero-sum dynamic games with linear state dynamic and quadratic cost functional is also provided. The feasibility and validity of this architecture are provided through the study of collaborative path planning of two robotic platforms. In a completely deterministic framework, the results suggest that the optimal solution for this game can be obtained. The proposed method as well as the collaborative architecture could be used for smart wheelchair team and unmanned vehicle team collaborative control.

The author wishes to thank Prof. Paul Cohen of École Polytechnique Montréal, Québec, Canada.