APP下载

Energy-Efficient Underwater Data Collection:A Q-Learning Based Approach

2022-10-12HaiyanZhaoJingYanTaoWuAihongLiandXiaoyuanLuo

Haiyan Zhao, Jing Yan, Tao Wu, Aihong Li and Xiaoyuan Luo

Abstract Underwater data collection is an importance part in the process of network monitoring,network management and intrusion detection. However, the limited-energy of nodes is a major challenge to collect underwater data. The solution of this problem are not only in the hands of network topology but in the hands of path of autonomous underwater vehicle(AUV).With the problem in hand, an energy-efficient data collection scheme is designed for mobile underwater network.Especially, the data collection scheme is divided into two phases, i.e., routing algorithm design for sensor nodes and path planing for AUV. With consideration of limited-energy and network robustness, Q-learning based dynamic routing algorithm is designed in the first phase to optimize the routing selection of nodes, through which a potential-game based optimal rigid graph method is proposed to balance the trade-off between the energy consumption and the network robustness. With the collected data, Q-learning based path planning strategy is proposed for AUV in the second phase to find the desired path to gather the data from data collector, then a mode-free tracking controller is developed to track the desired path accurately. Finally, the performance analysis and simulation results reveal that the proposed approach can guarantee energy-efficient and improve network stability.

Keywords Underwater data collection;Q-learning;Energy efficient;Path planning;Autonomous underwater vehicle

1 Introduction

Underwater cyber-physical system (UCPS) is a new complex system with high efficiency communication and effective control. It is widely used in various underwater engineering and research fields,such as resource protection,ocean environment monitoring, and early warning system of natural disaster(Gong et al.2020;Lin et al.2019;Wang et al.2015).In order to support these applications,the data acquisition of underwater environment becomes a primary task to be solved. To handle this problem, internet of underwater things (IoUT), as the core components of UCPS,has attracted the interest of researches, which has advantages of low cost, rapid deployment, and self-organization(Cashmore et al.2018;Li et al.2019;Yan et al.2020a).

Different from ground environment, the underwater environment has many special characteristics, which poses a substantial challenge to the underwater data acquisition.Firstly, the robustness of networks is more demanding since IoUT is a sparse network. To improve the network robustness, a common method is to improve network connectivity. However, a good connectivity can increase the energy consumption of network, which leads to shorten network lifetime. Secondly, the power of nodes in IoUT is difficult to recharge since the nodes are powered by batteries (Lin et al. 2016;Yan et al. 2021a, 2021b). Thirdly, the high-frequency radio waves are strongly absorbed in underwater environment, which cannot used in underwater scenario. Till now, the acoustic wave as the only available signal for underwater communication,but the transmission rate of acoustic wave is 1.5×103m/s, which is much slower than radio wave (Li et al. 2016b; Su et al. 2019;Yan et al.2020b).Due to these challenges,the data collection schemes proposed for terrestrial sensor network are no longer suitable for IoUT.

Up to now, underwater data collection schemes can be divided into two categories: data collection by multi-hop transmission and data collection by autonomous underwater vehicle (AUV). To be specific, multi-hop transmission mainly transmits the data to the receiver (or sink node) on the water surface through a series of routing mechanisms.However,the nodes that near to the receiver consume energy quickly. Thereby, there is an imbalance of energy consumption in multi-hop transmission, which is easy to lead to the emergence of energy voids(Ren et al.2016).For instance,Xie et al.(2006)firstly designed a vector-based forwarding (VBF) protocol, which can provide robust, scalable and energy-efficient routing. Besides that, a poweraware data collection algorithm was developed (Dhurandher et al. 2009). However, these literature were not suitable for sparse networks due to substantial energy consumed by this protocol via flooding.To reduce energy consumption, Wu et al. (2021) provided a Hierarchical Adaptive Energy-efficient Clustering Routing (HAECR) strategy to prolong the network lifetime. In Ali et al. (2014), a novel routing strategy named Layer by layer Angle-based flooding (L2-ABF) was provided under the constraints of movement of nodes, delays and energy consumption. Inspired by Xie et al.(2006),an adaptive hop-by-hop vectorbased forwarding routing (AHH-VBF) protocol was designed (Yu et al. 2015). Some other multi-hop transmission based data collection schemes were presented (Jin et al.2017;Wei et al.2013).Although these schemes can achieve energy efficient, the robustness of the network is not considered.For the second category,the data collection task is completed by assistance of AUV. In Khan and Cho (2014;2016), multiple AUVs assisted data collection method was proposed. Besides that, Han et al.(2019) developed a prediction-based delay optimization data collection method to solve the issue of data collection delay. In Duan et al.(2020),the value of information was utilized as a metric to measure the quality of information. In view of this, a reliable hierarchical information collection system was constructed for data collection. In addition, Cai et al. (2019)applied the mobile edge elements into data collection algorithm, whose results were helpful to solve the problem of unbalanced energy consumption. In Han et al. (2018), a stratification-based data collection algorithm was proposed for underwater data collection with constraints of water delamination and limited-energy.Although these algorithms help to improve the energy efficient, how to improve the robustness of network has not been well studied.We notice that topology is a good way to balance network connectivity and energy consumption. Referring to Shames and Summers (2015), we found that the rigid characteristics of the network are closely related to its algebraic properties. In Luo et al. (2018), the potential-game based optimally rigid topology control algorithm was proposed to improve the network performance. The rigid graph based topology optimization algorithm was employed to balance the trade-off between energy consumption and network connectivity in Yan et al. (2016). Although the works in Luo et al.(2018)and Yan et al.(2016)demonstrate a good trade-off between energy consumption and connectivity maintenance, the relationship between topology optimization and routing strategy has not been studied. With regards to limited-energy and network connectivity, how to adopt the topology optimization technology to balance the trade-off between the energy efficient and network connectivity for routing algorithm is still an open issue.

With the assumption of the data is collected by data collector, some path planning schemes have been proposed for AUV.For instance,the path planning problem of multiple robots was studied for data collection in Bhadauria et al. (2011). In McMahon and Plaku (2017), the samplingbased motion planning were incorporated into constraintbased solvers to achieve autonomous data collection. Nevertheless, the methods in Bhadauria et al. (2011) and Mc-Mahon and Plaku (2017) only consider the effect of geographical location on AUV data collection, the validity of the data is not considered. Since the data generated by the nodes has associated values that decay over time,it is necessary to consider the value information of the collected data. Based on this, the timeliness of the data was used to plan the path of AUV in Gjanci et al. (2018), Liu et al.(2022), then maximize the effective value of the total data collection.Our previous work(Yan et al.,2018)developed a dynamic value-based path planning method for AUV to maximize the value of information. However, the obstacle avoidance is not addressed. With consideration of geographical location of nodes, timeliness of the data and obstacle,how to design the path planning and tracking strategy of AUV is another problem to be solved.

With these problems in hand, an AUV-assisted energyefficient data collection scheme is proposed in this paper.Two phases data acquisition algorithm is designed, i.e.,routing algorithm design for sensor nodes and path planing for AUV.In the first phase,the optimal rigid graph was incorporated into dynamic routing algorithm for sensor nodes to transmit the data to data collector.Of note,the proposed method can balance the trade-off between the energy consumption and the network robustness.Subsequently,the Qlearning based AUV path planning strategy is developed in the second phase to periodically accesses the data collector and periodically resurfaces the water to unload the collected data to the control center.In particular,the main contributions of this paper are as follows:

1) Routing algorithm design for sensor nodes. The potential game based optimal rigid graph is designed to balance the trade-off between the energy consumption and network connectivity,and then a Q-learning based dynamic routing algorithm is provided for sensor nodes to relay the collected data to data collector. Compared with the existing works (Li et al. 2016a;Yan et al. 2018), the routing algorithm in this paper can prolong the network lifetime and improve the robustness of network.

2) Path planning and tracking for AUV. A Q-learning based path planning algorithm is developed for AUV to guide the movement of AUV,and then a proportional-integral-derivative (PID) based tracking algorithm is proposed to track the desired path.Compared with the existing work(Han et al.2017),the path planning algorithm in this paper can improve the information value of the total data collection with a time limit and avoid the influence of environmental obstacles.

2 System model and problem formulation

2.1 System model

As depicted in Figure 1,a network including sensor nodes,data collectors and AUV is designed for data collection.

• Sensor Nodes. The role of sensor nodes is to perform underwater monitoring tasks, whose clocks are synchronized and the coordinates are accurately known. It should be emphasized that, sensor nodes can move passively due to the effect of current.

• Data Collectors.Data collectors are static nodes,whose function is to gather data from sensor nodes through some routing mechanisms.Particularly,the locations of data collectors are fixed with the anchor wire constraints.

• AUV. An AUV is deployed to visit data collectors,through which the data can be retrieved by high speed visible light communications. Meanwhile, AUV periodically surfaces to offload the retrieved data to control center via high speed radio communications.Without loss of generality,we assume that the batteries equipped on AUV are sufficient.

Similar to Yan et al. (2018), the underwater monitoring area is divided intoMsubareas. It is noted thatNsensor nodes and one data collector are deployed in each subarea.For an arbitrary subarea, sensor nodes collect data from the surrounding environment and send the collected data to data collector through single-hop or multi-hop communication.Without loss of generality,we assume that a setEof eventsE1,…,E|E|occur in the underwater monitoring area, where |E| is the total number of events. When sensor nodei∈{1, …,N} has sensed an eventEkat timetk,i, the value of information (VoI) CEk,ifor the sensed data on eventEkcan be acquired by sensori, where 1≤k≤|E|.Specially,the VoI CEk,iat timetis defined as

Figure 1 Description of the network architecture,where an AUV is employed to retrieve data from multiple data collectors

whereFEk∈R+ande-(t-tk,i)denote the event importance and timeliness ofEk,respectively.0≤βk≤1 represents the information weight,whose role is to balance the trade-off between importance and timeliness.It is noted that,the event importance can be designed and modified according to the monitoring levels. Meanwhile, the event timeliness is a monotonically decreasing function, which is to captures the temporal decay of the sensed data. This is intuitive as later retrieve of a data segment can lead to higher losses.

Different from the terrestrial networks, the sensor nodes in IoUT move passively since the effect of current. Based on this, a meandering current mobility model (Xia et al.2017) is utilized to describe the movement of sensor nodes. Accordingly, the movement of sensor nodei∈{1,…,N}can be updated as

wheredis the transmission distance between one sensor node and its next-hop node,SL∈R+denotes the sonar source level,l→denotes the transmission loss range,αrepresents the absorption coefficient in dB/km, TL denotes the transmission loss, andTtxis the transmission time of a single packet.

It is noted that, the body-fixed reference frame (BRF)and inertial reference frame (IRF) are adopted to describe the AUV dynamics. Due to the intrinsic stability of rolling and pitching of open-frame vehicles (Caccia and Veruggio 2000; Kim et al. 2016), this paper presents a simplified model including four degrees of freedom(4-DOF),i.e.,neglecting pitch and roll motions. Based on this, the dynamics of AUV is formulated as

wherex,y, andzare the positions onX,YandZaxes, respectively.φdenotes the angle in yaw.

2.2 Problem formulation

Problem 1(Routing strategy design for sensor nodes)Since the complexity of underwater environment, such as coral reefs and fish stocks, it affects data transmission and requires higher robustness to networks. Meanwhile, the battery power installed on underwater sensor node is limited and cannot be recharged.In order to avoid the energy holes and prolong the network lifetime, the optimal rigid graph is incorporated into dynamic routing algorithm to relay the data from sensor nodes to data collector. This problem is reduced to optimize the routing selection of sensor nodes with the consideration of energy holes and network robustness.

Problem 2 (Path planning and tracking for AUV) In mobile IoUT, data collectors gather data from sensor nodes,but the collected data is decaying with time. To maximize the VoI of data delivered to control center, we aim to design a Q-learning based path planning strategy for AUV to find the desired path, which can avoid environmental obstacles effectively. Of note, the AUV dynamics are highly complex, and the exact dynamic parameters of AUV are not easy to acquire due to the influence of ocean currents.To solve this issue, a model-free tracking tracking controller is sought to be provided,such that AUV can accurately track the desired path.

3 Energy-efficient data collection approach

In this section, the energy-efficient data collection approach is developed.As shown in Figure 2,the approach is divided into two phases, i.e., 1) routing strategy for sensor nodes; 2) path planning and tracking for AUV. The detailed process of the two phases is described below.

Figure 2 Framework of energy-efficient data collection approach

3.1 Routing strategy for sensor nodes

Without loss of generality,the routing strategy of sensor node is divided into two stages, i.e., optimal rigid graph design and dynamic routing algorithm design. In the following,the routing strategy for sensor nodes is given.

3.1.1 Optimal rigid graph design

For ease of description, an arbitrary subareaS1is taken as an example. Without loss of generality, subareaS1can be modelled as a graph G(V, E), in which V={1, 2, …,N,N+1}is the set of sensor nodes and data collector in subareaS1,and E={(i,j)∈V×V:i≠j}is the edge set.Of note,each edge in the graph is expressed by a different pair of vertices (i,j). If any vertex pair (i,j) of graph G(V, E) satisfies∀(i,j)∈E⇒(j,i)∈E, then graph G(V, E) is called an undirected graph, otherwise it is called directed graph. Especially, graph G(V, E) is called rigid graph if vertex is removed does not destroy the stability of graph. Otherwise,it is called a flexible graph.It is noted that,it is called minimum rigid graph if the rigid graph becomes flexible when any an edge is removed from rigid graph.For a clearer description, examples of a flexible graph, a rigid graph, and a minimum rigid graph are given in Figure 3.Accordingly,the following definition and lemma are given.

Figure 3 Example of flexible and rigid graphs

Definition 1 (Zhang et al. 2015) Defining R(G)∈R|E|×|3V|is the rigidity matrix of a graph G(V, E) withN+1 vertices in 3D space.The graph is called a infinitesimally rigid graph if and only if the rank of R(G)is 3(N+1)-6,i.e,rank(R(G))=3(N+1)-6.

Lemma 1 (Luo et al. 2013) Gi(i=1, 2, …,N+1) is a minimum rigid graph,which composes of nodeiand its neighborhood. G (V, E)= ∪i=1,…,N+1Gi. Denote the generated optimal rigid graph as Gopr(V, Eopr). Then, the following statements hold:a)e∉Eopr,ife=(j,k)∈E,j,k∈Gi,ande∉Gi;b)The optimal rigid graph Gopr(V,Eopr)can be obtained by deleting those edges satisfying a).

The aim of this section is to balance the trade-off between the network stability and energy consumption by optimizing network topology. Of note, each node (i.e., data collector or sensor node) in the network is independent and affects each other.Based on this,the game theory is introduced to optimize the network.Without loss of generality, a strategy game consists of three elements, i.e., Player I, Strategic Space A, and Utility Function U. Thus, the strategy game is expressed asΓ{I, A, U}. In the following, the three elements of strategy game are introduced in detail:1)Player I∈{1,2,…,i,…,N+1},whereN+1 is the total number of players in game; 2) Strategic Space A:A=Πi∈IAiis the set of all possible combinations of policies for all players to choose.a=(ai,a-i) is the strategic of all players,whereai∈Aiis the strategy selection by playerianda-iis the strategic selection of players other than playeri;3)Utility Function U:A→ℝ fori∈I is mappings. Based on this,the following definitions are given.

Then,the gameΓ{I,A,U}is called OPG,and the functionU(a)is called OPF of the gameΓ{I,A,U}.

Based on the above definitions, the following two aspects are considered to construct the utility function.

Therefore, the utility function of gameΓ{I, A, U} can be constructed as

whereX(G,ai)is the vertex rigidity gramian with the strategyai.

In view of (11) and Lemma 1, the potential game based optimal rigid graph can be generated, and the corresponding pseudo codes are described in Algorithm 1. It should be emphasized that the Algorithm 1 contains two principles, i.e., 1) keeping the network is a minimum rigid graph, and 2) maximizing the trace of the vertex rigidity gramian.

Algorithm 1 Potential game based optimal rigid graph Input:A graph with N+1 vertices that are labelled as 1,2,…,N+1,the position of all nodes p ={p1 1, p2 1, p3 1, …, p1 i, p2 i, p3 i, …, p1 N + 1,p2 N + 1,p3 N + 1},the optional strategies spaces Ai;Output: The optimal rigid graph with the strategy a*a*i={a*1, a*2, …,N + 1};1:The game implements based on the node ID, ȧi= aci, ac i is the optional strategies for ∀i∈V;2:Ȧ={ȧ1,ȧ2,…,ȧN + 1};3:for i∈V do 4: if|Ȧi|≤3(N+1)-6 then 5: ȧi=argmaxai ∈AiUi(ai,a-i);6: |Ȧi|=|Ȧi|+1;7: end if 8:end for

3.1.2 Dynamic routing algorithm design

Algorithm 2 Q-learning based dynamic routing algorithm 1: Set the reward function matrix R according to optimal rigid graph in Algorithm 1;2:Initialization:Q=0,α and γ;3:for i=1:N do 4: for each episode do 5: Randomly select an initial state s1l;6: if s1l is not the data collector then 7: Use action a1 l to get the next state s1 8: Calculate Q(s1 l,a1 l)l + 1;by(13);9: s1 l←s1 l + 1;10: l=l+1;11: end if 12: end for 13: Popt(i)=arg max(Q);14:end for 15:Return optimal routing Popt of sensor nodes.

Therefore, the Q-learning based dynamic routing algorithm is given in Algorithm 2. It is noted that the sensor node selects the node with the maximum Q value as the next hop to forward data based on the Q table.The sensor node can transmit the data to data collector by shortest distance.Even if one transmission path is interrupted,the sensor node can transmit data to data collector successful.Therefore, it can guarantee to prolong network lifetime and improve the network reliability.

3.2 AUV path planning strategy

With the routing strategy for sensor nodes in Section 3.1,the data collector can collect the data from sensor nodes successfully. Define a binary variablekk,iwithi∈{1, …,N},kk,i=1 if sensorisense the eventEk,otherwisekk,i=0.Similarly, a binary variablelj,iwithj∈{1, …,M} is defined to express whether data collectorjcan obtain the VoI of sensori,i.e.,lj,i=1 if the data collectorjcan receive the VoI of sensori,otherwiselj,i=0.Accordingly,the VoI of sensoriand VoI of data collectorjfrom sensors can be calculated as

wheret∈[0,T],Dj(t) is the distance between current position of AUV and data collectorj.The purpose of subtraction ofDj(t) is to reduce the distance of access.αˉ∈R+is a scalar.In view of(16),the income function of AUV can be expressed as

Algorithm 3 Q-Learning based AUV Path Planning Input:The VoI of data collector RCj and the position information Dj(t)are received by AUV at time t=tin,where j∈1,2,…,M;Output:The dynamic path planing of AUV;1:Initialization:QA=0,α1 and γ1;2:while tin

Based on (17), one knows that AUV dynamically select the data collector with the largest income function as the destination at the next time.Due to the obstacles of the underwater environment, the Q-learning based path planning strategy is adopted to plan the optimal path of AUV for data collection. At the beginning, the AUV submerge to the depth of the data collector with the largest income function.Then, the current position of AUV is set as the initial state and the data collector with the largest income function is set as the destination.Of note,the area is uniformly divided into M=n×nlattices, in which each lattice corresponds to a state and its corresponding Q value is zero.Based on this, there are M states. Different from Section 3.1, each state in this section has four optional action, i.e.,action set includes moving toward east, west, north and south. To evaluate the performance of actions, the reward functionRA(s2l,a2

l)is defined as

Similarly, the pseudo codes of Q-learning based path planning strategy of AUV is presented in Algorithm 3.It is noted that the information value of the data collector, the effect of geographical location and environment obstacles are all considered.

According to Algorithm 3, the dynamic path planing of AUV can be obtained, i.e., the desired trajectory of AUV is acquired. Since the accurately dynamic parameters of AUV is difficult to obtain,the model-free tracking controller is developed to drive the AUV to track the desired trajectory. Considering that the PID controller is simplicity,easy implementation, excellent performance and meeting the demand of tracking, the PID tracking controller is employed in this paper for tracking.Without loss of generality, the trajectory of AUV is defined asXAUV=[x,y,z]Tand the desired trajectory of AUV is denoted asXr=[xr,yr,zr]T.Thus,the position tracking error is computed ase=Xr-XAUV.DefineF=[Fu,Fv,Fw]T.To ensureXAUV→Xrwithin a limited time interval,the PID tracking controller is designed as

whereKpis proportional gain,KIis integral gain andKdis derivative gain. Besides that, Δe(t)=e(t)-e(t-1) denotes tracking velocity error.

Obviously, the PID tracking controller does not require model information of AUV. In view of (Zhao and Guo 2017), the PID tracking controller can guarantee that the AUV can accurately track the desired trajectory by adjusting the parameters ofKp,KIandKd.

4 Performance analysis

Clearly, the energy consumption of the sensor nodes is used to receive data and transmit data. According to (3),one knows that the energy consumed by transmitting data is mainly related to distance between receiver and transmitter. We assume that the number of data packets is received by each node is same,then the energy consumption problem of sensor nodes is mainly determined by the energy consumed by transmitting data.

According to Algorithm 1, the weight of energy balance is maximized with maximizing the weight of rigid matrix.Thus, the transmission energy consumption of the sensor node is minimized. Due to the transmission energy consumption is mainly related to the transmission distance, the communication link of sensor node is minimized. Clearly,this proof also suitable for other subarea.

Theorem 3 The game modelΓ{I,A, U} for optimal rigid graph is an OPG.The functionU(a):A→→ℝ is an OPF ofΓ{I,A,U}.

Proof The ordinal potential function is defined asU(ai,a-i)=

According to Definition 2 and Definition 3, one can obtained that the game modelΓ{I, A, U} for optimal rigid graph is an OPG.The functionU(a):A→ℝ is an OPF ofΓ{I,A,U}.The proof is completed.

Theorem 4 Q-learning-based dynamic routing algorithms do not have data loops in the process of data fusion.

Proof According to (12), the reward function is set to the maximum reward valueRmaxwhen the next state is the data collector,the reward function is set to a negative value-Rmaxwhen the two nodes are not communication,and the reward function is set to -di,jwhen the nodes are communication but the next state is not the data collector.According to(13),we know that the Q value is higher when the communication link is close to the data collector.Therefore,the consistency of the transmission direction can be guaranteed.

5 Simulation results

In this section, simulation results are presented to prove the effectiveness of the proposed algorithm. Without loss of generality, a monitoring area of size of 250 m×250 m×250 m is divided intoMsubareas,Nsensor nodes and one data collector are deployed in each subarea.In each subarea, the data is collected by sensor nodes from surrounding environment. Subsequently, sensor node sends data to the data collector by single-hop or multi-hop manner. Of note,the number of sensor nodes and subareas are mainly determined by monitoring performance requirements.The more the number of sensor nodes and subareas, the higher the accuracy of data collection, but the higher the monitoring cost.Especially,the parameters used in simulation are given in Table 1. Accordingly, the initial deployment of the network is shown in Figure 4.

Table 1 Parameters used in the simulation

Figure 4 Initial deployment of network

5.1 Simulation for routing path of sensor nodes

The results of routing path of sensor nodes are analyzed in this section.Since the routing path of each subarea is similar, only the results of arbitrary subareaS1are investigated,i.e.,the results of other subarea are omitted in this paper.

Figure 5 Results for routing path under case 1

To verify the performance of dynamic routing algorithm, two cases are considered: 1) the static underwater environment;and 2)the dynamic underwater environment.Under case 1, the nodes in network are stationary. Based on this, the routing path of sensor nodes in this paper is shown in Figure 5(a).Correspondingly,the residual energy of sensor nodes under the method in this paper is described in Figure 5(b).In Li et al.(2016a),the sensor nodes forward the data to data collector through the neighborhood nodes,which is not the optimal communication topology. Under the same assumption, the routing path and residual energy of sensor nodes are shown in Figure 5(c) and Figure 5(d),respectively. In addition, a local routing decision strategy was proposed to complete data collection from sensor nodes in Yan et al.(2018).Accordingly,the routing path and residual energy of sensor nodes are shown in Figure 5(e) and Figure 5(f),respectively.From Figures 5(a)-(d),one knows that the method in this paper can prolong the lifetime of network and the robustness of network can be guaranteed.Referring to (Shames and Summers 2015), one knows that the rigid graph with the larger eigenvalues of determinant has better algebraic properties and the stability of the network is improved accordingly. Inspired by this, the trace ofX(G)under the methods in(Yan et al.2018)and this paper is shown in Figure 6(a). From Figure 6(a), Figures 5(a)-(b),Figures 5(e)-(f),we can see that the methods in(Yan et al.2018) and this paper are both effective for data collection and the stability of network is improved in this paper.

In case 2, the sensor nodes can be moved under the influence of ocean current. Based on this, the routing path and residual energy of sensor nodes under the method in this paper are shown in Figure 7(a) and Figure 7(b), respectively. Under the same assumption, the routing path and residual energy of sensor nodes under the method in Li et al. (2016a) are shown in Figure 7(c) and Figure 7(d),respectively. Correspondingly, the routing path and residual energy of sensor nodes under the method in Yan et al.(2018) are shown in Figure 7(e) and Figure 7(f), respectively. Besides that, the trace ofX(G)under two methods is shown in Figure 6(b). Obviously, the network lifetime can be prolonged by compare with the ones in Li et al.(2016a)and the stability of network can be improved by compare with the ones in Yan et al.(2018).Therefore,the proposed routing path method is meaningful.

5.2 Simulation for path planning and tracking of AUV

In order to verify the effective of the path planning in this paper, the VoI collected by data collectors are set as:RC1=0,RC2=2.4,RC3=1.8,RC4=3.5,RC5=3.8+0.3t, andRC6=0.05, wheret∈[0, 160]. Based on the VoI, the AUV analyzes the VoI of data collectors and path planning through Algorithm 3. Accordingly, the path planning of AUV (i.e., desired trajectory of AUV) with the method in this paper is shown in Figure 8(a).In Han et al.(2017),the traveling salesman problem (TSP) strategy was developed to plan the trajectory of AUV.Under the same assumption,the path planning of AUV with the method in Han et al.(2017) is described in Figure 8(b). From Figures 8(a)-(b),we can seen that the path planning of AUV under the method in this paper is more appropriate comparing with the method in Han et al. (2017), since the VoI of data collectors 1 and 6 are small, they do not need to be accessed by AUV. Of note, accessing valueless data collectors can increase the offload time for the collected data to the control center on the sea surface.To verify the effectiveness of the proposed algorithm, the environmental obstacles are set in paths 2 and 3,i.e.,the pink area in Figure 8(c).Accordingly, the path planning of AUV from data collector 2 to data collector 3 is shown in Figure 8(c),where the optimal path of AUV is represented by red origin. Clearly, the Q-learning based AUV path planning algorithm in this paper can effectively avoid the impact of environmental obstacles.

Figure 6 Algebraic properties

Figure 7 Results for routing path under case 2

Figure 8 Simulation results for path planning and trajectory tracking of AUV

Based on the desired trajectory of AUV,the PID tracking controller is employed for tracking. In the simulation, we only give the part of 2→3.Based on this,the tracking trajectory with the method in this paper is shown in Figure 8(d).Correspondingly, the tracking position and tracking error are shown in Figure 8(e) and Figure 8(f), respectively.According to Thenozhi and Yu (2014), we found that the PD controller can achieve tracking but has a steady-state error,which can be reduced by introducing an integral term.Therefore,the PID controller can achieve a zero steady-state error.Based on this, the tracking trajectory, tracking position,and tracking error under the PD tracking controller are shown in Figures 8(d)-(f). Obviously, PID controller has better performance than PD controller.

6 Conclusion

With consideration of energy consumption, network robustness,and environmental obstacles,the energy-efficient data collection issue for IoUT has been studied in this paper.At the beginning,the potential game based optimal rigid graph strategy is proposed to balance the tradeoff between the energy consumption and the network stability.Based on optimal communication topology,the Q-learning dynamic routing algorithm for sensor nodes is developed to collect data from sensor node to data collector.Then,Qlearning based AUV path planning algorithm is presented for AUV to plan trajectory to collect data from data collector,through which the PID tracking controller is utilized to track desired trajectory. Finally, simulation results show that the proposed algorithm can ensure the energy efficient and network stability by compare with some existing works,and the PID tracking controller can track desired trajectory well. In the future, we will further expand the verification fields of the experimental platform.

FundingSupported by the National Natural Science Foundation of China (61873345, 62222314), the Distinguished Young Foundation of Hebei Province (F2022203001), the Central Guidance Local Foundation of Hebei Province (226Z3201G), the three-three-three Foundation of Hebei Province (C20221019), and the Open Fund Project of Key Laboratory of Ocean Observation Technology, MNR(2021klootA02).