Target tracking method of Siamese networks based on the broad learning system

2023-12-01DanZhangPhilipChenTieshanLiYiZuoNguyenQuangDuy

CAAI Transactions on Intelligence Technology 2023年3期

Dan Zhang| C.L.Philip Chen | Tieshan Li | Yi Zuo | Nguyen Quang Duy

1Navigation College, Dalian Maritime University,Dalian, China

2Innovation and Entrepreneurship Education College, Dalian Minzu University,Dalian, China

3Computer Science and Engineering College, South China University of Technology,Guangzhou,China

4Department of Computer and Information Science, Faculty of Science and Technology,University of Macau, Macau, China

5School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu,China

6Faculty of Navigation,Vietnam Maritime University,Haiphong,Vietnam

Abstract Target tracking has a wide range of applications in intelligent transportation, real-time monitoring, human-computer interaction and other aspects.However, in the tracking process, the target is prone to deformation, occlusion, loss, scale variation, background clutter, illumination variation, etc., which bring great challenges to realize accurate and real-time tracking.Tracking based on Siamese networks promotes the application of deep learning in the field of target tracking,ensuring both accuracy and real-time performance.However, due to its offline training, it is difficult to deal with the fast motion, serious occlusion,loss and deformation of the target during tracking.Therefore,it is very helpful to improve the performance of the Siamese networks by learning new features of the target quickly and updating the target position in time online.The broad learning system(BLS)has a simple network structure,high learning efficiency,and strong feature learning ability.Aiming at the problems of Siamese networks and the characteristics of BLS, a target tracking method based on BLS is proposed.The method combines offline training with fast online learning of new features, which not only adopts the powerful feature representation ability of deep learning,but also skillfully uses the BLS for re-learning and re-detection.The broad re-learning information is used for re-detection when the target tracking appears serious occlusion and so on,so as to change the selection of the Siamese networks search area, solve the problem that the search range cannot meet the fast motion of the target, and improve the adaptability.Experimental results show that the proposed method achieves good results on three challenging datasets and improves the performance of the basic algorithm in difficult scenarios.

K E Y W O R D S broad learning system, siamese network, target tracking

1 | INTRODUCTION

Target tracking is an important research content in the field of computer vision and is widely used in automatic driving,human-computer interaction, video surveillance and other aspects [1].The target is specified in the first frame in target tracking, but there are deformation, occlusion, loss, scale variation,illumination variation,complex or similar background in the tracking process,which bring great difficulties to real-time and accurate tracking.In recent years, with the efforts of scholars and the development of science and technology,tracking algorithms have also made great progress and many fast and accurate methods have been studied, but it is still difficult to overcome various challenges to achieve high precision and real-time tracking.The strong feature learning ability of deep learning makes its application in target tracking to achieve good results,such as DLT[2]and DNT[3].However,due to its long training time and a large amount of calculation, how to ensure real-time tracking is always a concern of tracking algorithms based on deep learning.The tracking algorithm [4–8]based on pre-training has achieved a good tracking effect and solved the problems of insufficient samples and the influence of online update on tracking speed.However,it cannot fully adapt to the multiple situations of the target in the tracking process.MDNet[9]and CNT[10]algorithms adopt the form of online fine-tuning to solve the problem of target tracking adaptability to a certain extent,but the real-time performance is affected.In 2016,Bertinetto et al.proposed the SiamFC[11]method,which is based on offline end-to-end training, learns similarity function and adopts correlation operations for matching,achieving good results in accuracy and real-time performance.This method has received great attention in the follow-up, and many improvements have been made on this basis, such as CFNet [12], FlowTrack [13], SiamRPN [14], SiamMask [15],SiamFC++ [16], and SiamRPN++ [17].There are also many tracking algorithms using the Siamese networks structure,such as Transformer tracking[18],GradNet[19],‘Skimming-Perusal’tracking[20],and GOTURN[21].

Although the SiamFC method has undergone a lot of offline training, it is only adapted to the process of target tracking to a certain extent.However, due to the frequent target apparent changes of tracking task,the learning ability of the algorithm in the process of target tracking needs to be further strengthened in order to achieve accurate tracking.The adaptive ability of this method is weak when the target moves fast,mainly because it only searches the target within a certain range in the tracking process.If the range of fast motion is beyond its search range, the accurate position of the target cannot be matched.In addition,the cosine window penalty will also be affected, thus affecting the tracking result.When the target is occluded or lost,it is difficult for the algorithm to find the target accurately.The BLS[22,23]is a new neural network method proposed by Professor Chen in 2017–2018.This method has fast computing speed,can train a large amount of data in a short time, and has strong learning ability.Many scholars have carried out many applications and improvement studies based on this method, such as [24–27], which have achieved good results.

This paper proposed a target tracking method of Siamese networks based on the BLS(BLSiam).This method makes use of the characteristics of BLS with fast training speed and strong learning ability to train and track online in the tracking process of SiamFC.When SiamFC tracking is not accurate,relearning and re-detection are carried out to make up for the shortcomings of SiamFC without online learning and a limited search range, making the algorithm more adaptable.In this paper, the proposed method is tested on three general and challenging datasets, and the results show that the proposed method improves the tracking accuracy of SiamFC and achieves better real-time performance and accuracy compared with other popular tracking algorithms.

2 | RELATED WORKS

2.1 | Siamese networks target tracking

In recent years, the target tracking methods based on Siamese networks have attracted great attention and achieved good results.The target tracking methods based on Siamese networks, mainly the offline training similarity function based on a large amount of data, and earlier proposed methods include SINT [28] and SiamFC.SiamFC network clipped and adjusted the size of the input image, took the image pair as the input of the network, and calculated the similarity of the target and candidate boxes through correlation operations.In this process, 3 or 5 scales were set.This method has the advantages of low computational cost, simple operation and high efficiency.Subsequently, a large number of improved algorithms appear to further improve the performance of the algorithm.SiamRPN network is a typical improved algorithm.Based on the Siamese network framework, this method introduces the RPN network, which has abilities of regression object categories and a bounding box.This regression method is more suitable for the scale variation in the process of target tracking, solves the problem of bounding box regression, and improves the accuracy of tracking.There are many subsequent improvement methods, such as ATOM[29], Ocean [30], DiMP [31], DaSiamRPN [32], etc.The improved tracking methods are mainly improved from the aspects of more advanced bounding box, adjustment backbone and online update.Of course, correlation filtering and deep learning are combined in some methods.These improved networks are more accurate or real-time.However,it is difficult to improve accuracy and ensure good real-time performance in the process of improvement.Therefore, more new methods are needed to obtain more effective features of the target in the process of tracking with less impact on realtime performance.

2.2 | Training and learning

The target tracking task usually specifies the target to track through the first frame.In the case of online training and tracking, accurate data can be obtained only from the first frame,and the follow-up tracking is judged by the results of the previous frame and some update operations.Such classic tracking methods include TLD[33]and Struck[34].However,only the data of the first frame is used for learning and judgement, the data is too little to meet the judgement of subsequent target apparent changes.Even if some online updates are carried out, the algorithm cannot show strong adaptability.

With the development of deep learning and technological progress, it is possible to learn massive data.These methods have good tracking accuracy, but due to the large amount of training data and complex calculation, the real-time performance of the algorithm is affected.Therefore,the methods of pre-training and online fine-tuning have played a certain role,but there are still difficulties in accuracy and real-time performance for challenging target tracking datasets.Massive data of offline training and more features of learning targets play a certain role in accurate tracking, but there are also problems,such as long offline training time and poor adaptability of target loss and occlusion.

2.3 | Broad learning system

The original intention of BLS is to solve the problems of complex structure and long training time of deep network.The network is mainly divided into two layers: input layer and output layer.The input layer is composed of feature nodes and enhancement nodes.By adding data for enhancement training,the previously trained data do not need repeated training,which can greatly improve the efficiency of network learning.BLS uses sparse representation in the process of network calculation and learning, which can better express features.Experimental results show that this method has strong classification ability, and the real-time performance of training is significantly improved.According to the previous analysis and the performance of BLS, it is not difficult to find that BLS is very suitable for online training and updating in target tracking.It can improve the accuracy and adaptability of the original method without excessively affecting the real-time performance of the algorithm and has good re-learning and redetection capabilities.

3 | BROAD‐SIAMESE TARGET TRACKING METHOD

Broad-Siamese algorithm (BLSiam) is a target tracking algorithm based on BLS in Siamese networks.The method makes full use of the fast training and sparse representation of BLS and combines SiamFC(with 3 scales) for offline training and online tracking,which improves the feature representation and learning ability of the network.The specific structure of the network is shown in Figure 1.

3.1 | Algorithm structure

F I G U R E 1 Broad learning Siamese networks target tracking method framework

The overall framework for this method is shown in Figure 1.SiamFC is the basic algorithm of the tracking method in this paper.SiamFC used a large amount of data to train the network offline.The input is divided into target image and search area image.The features are obtained through the network, and the correlation operation is carried out to find the similar area with the target in the search area and determine the target position of the current frame.The method in this paper firstly performs target tracking through SiamFC.If the tracking result score is greater than the threshold value(thred2), the tracking is considered accurate.And the tracking result score is greater thanthred1, the positive and negative samples of the frame image are obtained according to the tracking result.Multiple candidate boxes are generated near the target location.The IoU value between the candidate box and the target location greater than or equal to 0.7 is set as a positive sample, and the IoU value less than or equal to 0.3 is set as a negative sample.In the case of accurate tracking,samples are collected for each frame and saved.If the tracking result score is less than thethred1, it is considered that the tracking is not accurate, and no BLS training (flag = 0), the collected samples are used as the input of BLS for online training.At this time,set flag=1.The training speed of BLS is fast, the training can be completed online quickly.The trained BLS can be used for tracking and evaluation, and the trained process is a re-learning process.When the baseline tracker is tracking inaccurately, that is, the tracking result score is less thanthred2.BLS, which is trained, evaluates candidate boxes according to newly learnt features and selects the candidate boxes with the highest score in the current frame as the tracking result, which is a process of re-detection.Then, the new detection result is sent to update the SiamFC tracking position and track the next frame, if the score is still smaller than thethred2, broad learning tracker is still used for redetection; if the score is greater than thethred2, SiamFC continues the tracking.

In this process, there may be inaccurate tracking at the beginning,that is,the score of the second frame is less than thethred2.However, the tracking must be accurate in the first frame.The method in this paper starts to collect positive and negative samples in the first frame.Even if such a situation occurs, samples can also be used for training BLS to ensure subsequent tracking.Although in this case, the samples available for training are very few, the effective operation of the method can still be guaranteed.Of course, in this case, BLS may not guarantee that the subsequent re-detection is correct,but in this extreme case, the baseline tracker is already inaccurate.At least, BLS can make corrections.If the correction function is played,it is best;if not,the tracking accuracy of the baseline tracker will not be affected.

SiamFC is trained offline, and the target features of the correlation calculation always use the features of the first frame.In the complex tracking process,the target's appearance changes, which is difficult for the algorithm to overcome.In this paper, the positive and negative samples of the tracking process are used for BLS online training, which well describes the specific target changes and characteristics in the process of tracking.Although the number of samples is not very large in the tracking process, it more appropriately describes the characteristics and background changes in the target tracking process,which greatly improves the adaptability of the tracker.The training speed of BLS is fast, and the completion of training in the tracking process does not affect the tracking speed too much, which is undoubtedly a good online update method for the offline basic tracker.Broad-online and deepoffline are a good combination, which not only makes the full use of the advantages of BLS to train quickly and obtain new features,but also makes use of the characteristics of deep learning to obtain effective features from a large amount of data.Although BLS is only a short learning,the trained tracker makes the full use of the characteristics of the online target,which can effectively perform re-detection,update the tracking position of the basic tracker, and improve the accuracy and adaptability of follow-up tracking.

3.2 | Establishment of BLS

BLS establishes an input layer through feature nodes and enhancement nodes, which can be dynamically updated and increased.However,it is solved in the form of pseudo-inverse,which is fast and does not require repeated calculation of the previous node features, greatly improving the computing efficiency.At the same time, the characteristics of sparse autocoding are given full play in broad learning to better obtain features.The specific operation process of the algorithm are as follows.

In BLS, the pseudo-inverse solution is used, and the following optimization problem is also a way to solve pseudoinverse:

where makeσ1=σ2=u=v=2, the optimization problem above becomesL2norm regularisation.λrepresents a further constraints on the sum of the squared weights,weights denoted byW[23].This solution is equivalent to the ridge regression theory to achieve generalised inverse.Ifλ=0,then the problem is reduced to the least squares problem and the solution is directed to the pseudo-inverse solution.So,there are:

Sparse auto-coding is also used in BLS.In order to obtain sparse features from training data,the optimization problem in Equation (1), setσ2=u=1,σ1=v=2, then,

Among them ^Wis the sparse autoencoder solution.There are many ways to solve the optimization problem of Equation(4),among which alternate the direction multiplier method(ADMM) [35] is one of them.Consider the problem in Equation (4) as the following problem [23]:

wheref(w)=‖Zw-x‖22,g(w)=λ‖w‖1.In ADMM, the above problem can be written as

Feature mapping is composed ofngroups withknodes in each group, andqnodes are added to enhanced nodes inmgroups.In this way, the input layer of BLS is established.The specific process is shown in Table 1 and Figure 2.

In Figure 2, a set of images (for example: BlurCar1) to be trained, which are used to detail the process of establishing BLS feature nodes and enhancement nodes.It can be seen from the figure that an image is processed in the early stage and the sample size is determined by training the target size of the image.The two dimensional sample image is converted into one dimension as input for calculation.Weiandβeiare randomly generated, then Znis obtained (Equation 9), thenHmis established (Equation 10), and then combined ZnandHmto form the input layer of BLS.

3.3 | Re‐learning and re‐detection

Through the previous explanation, we understand the establishment of the input layer of BLS and the process of feature acquisition.Broad learning and deep learning have different mechanisms.Therefore, new features can be learnt in the learning process to improve the adaptability of the basic algorithm.The method in this paper combines BLS with SiamFC, and makes use of the fast training characteristics of BLS in the tracking process to carry out online learning,which can learn the characteristics of the target in tracking process,making up the deficiency that the original network cannot carry out online learning and updating.The specific re-learning process is as follows.

In Section 3.2,the input layer of the BLS is established.For the output layer, letYbe the expected output matrix.Therefore, the equation of the BLS can be written as:

whereWm=[Zn|Hm]+Y,Wmare the weights of the BLS,they can be obtained by ridge regression approximation inEquation (3).That is, add outputYto Table 1,Y=[Zn|Hm]Wm, set [Zn|Hm] asAmn, then(Amn)+can be calculated by Equation (3), thenWmcan be calculated.In this paper, the output of the positive samples is set to 1 and the output of the negative samples is set to 0.

T A B L E 1 Input layer establishment process of BLS

Through the training of positive and negative samples in the tracking process, a tracking evaluator with BLS can be obtained,which can be used to evaluate the candidate boxes so as to select a better candidate box as the tracking result.This is a process of re-detection.

Set two thresholds, the first threshold is the sample collection and priming training threshold, set tothred1.The second threshold is the threshold for re-detection, set tothred2.If the tracking score is greater than thethred1,SiamFC tracking is considered accurate, positive and negative samples are generated near the tracking results and collected.Set flag = 0 before BLS training.If the tracking result score is less thanthred1, it indicates that it is not accurate, so the sample collection is not carried out, and the obtained samples are used to train BLS, set the flag = 1 at this time.When the tracking score is greater thanthred2, SiamFC tracking is considered accurate.When the tracking score is lower than thethred2, the tracking result is considered inaccurate, and when the flag = 1, the BLS tracker is enabled for re-detection.The two thresholds can be set to the same value.In this case, when the tracking is not accurate, samples will not be collected, and the BLS will be directly trained, and then the BLS tracker will be used to evaluate the candidate boxes for re-detection, this timethred1=thred2.Of course, these two thresholds may not be the same value.In this case, the tracking sequence is generally long, and more samples can be collected in the early tracking, which basically meet the requirements of training.If the tracking score is less thanthred1, stop to collection, the BLS is trained, but the BLS evaluator is not directly used for tracking.Instead, the BLS tracker is enabled for re-detection when the tracking score is less than thethred2; this timethred1≠thred2.This can save training time and improve tracking efficiency.See Figure 3 for the specific usage of thresholds.In practice, it is found that when inaccurate tracking occurs, the score of tracking results decreases rapidly, and the thresholds can be found by finding this drop point as shown in the following Table 2.

F I G U R E 2 Broad learning system input layer setup process

F I G U R E 3 Thresholds usage flowchart

T A B L E 2 A partial tracking thresholds determination table

As can be seen from Table 2,when tracking is accurate,the evaluation score is high; when tracking drift or scale maladaptation occurs, the evaluation score will fall down.However,there is also a case where the background is complex and similar to the target, although the tracking is not accurate, the score is higher, but the score is generally lower than the score when tracking is accurate.Therefore, when the score differs greatly from the first frame and previous two frames,the score near it can be used as the critical point of the thresholds.

The BLS tracker is used for tracking,mainly by generating multiple candidate boxes around the target position of the previous frame for evaluation.Therefore, this process is a rough positioning,mainly for online correction of large drift of SiamFC tracking position.In this paper, the generated candidate boxes are preprocessed.After the relevant formula changes in Section 3.1,the feature data featureXare obtained.The parameters used in the formula are obtained after 3.2 section of training, then the evaluation results res of the candidate boxes are:

Choose the one with the highest score MaxX=max(res),then find the candidate boxgtMaxcorresponding to the result with the highest score and take it as the final tracking result to complete the re-detection process.In the process of candidate boxes generation,the displacement amplitude is small,but the range is larger than the search range of SiamFC, so when fast motion occurs, a larger range of search can be carried out to help find the target position.Since BLS is completed by online training, it has strong discrimination and adaptability.Generally, it can find the rough position of the target within a few frames.When the position score is greater than thethred2,the tracking position of SiamFC can be updated online to complete re-detection and improve the overall adaptability and accuracy of the algorithm.

4 | EXPERIMENT

4.1 | Experiment settings

The proposed method uses GPU runs in Windows10 environment for Quadro series,the running software is MATLAB R2019b.SiamFC parameters are the same as the original network, the number of feature nodes in each group of the BLS is 10, there are 8 groups, and the number of enhanced nodes is 300.Random weightsWei,βei(i=1,…,n),Whj,βhj(j=1,…,m)were sampled in the normal distribution interval[-1,1].To verify the effectiveness of the method, this paper conducted tests and comparisons on OTB100 [36], UAV123[37]and VOT2017[38].The comparison methods include CT[39], IVT [40], DLT, TLD, KCF [41], CNT, CN, Struck,SRDCF [42], CFNet and SiamFC.The comparison methods adopt relevant parameters of the original methods, except SRDCF method use the results published by their original method,other comparison methods are re-run under the same environment as this paper.

4.2 | Experimental results and analysis on OTB100, UAV123 and VOT2017 datasets

4.2.1 | Experimental results and analysis on the OTB100 dataset

F I G U R E 4 Tracking results on the OTB100 dataset

The OTB100 dataset is one of the most widely used datasets for target tracking algorithm testing.It contains 100 challenging video sequences.There are 11 disturbance attributes in video sequence, including fast motion, scale variation,illumination variation and occlusion,etc.The OTB platform is evaluated by precision and success rate.Precision refers to the percentage of frames where the distance between the centre point of the tracking result and the centre point of the labelled target box is less than a given threshold.The success rate describes the proportion of frames whose the overlap rate is greater than a certain value in the tracking sequence, and the overlap rate is the intersection ratio between the tracking result and the real target position.In this paper,the precision and the success rate are expressed mainly through OPE (one-pass evaluation).OPE calculates the precision and the success rate of tracking by initialising the target position of the first frame.

As can be seen from Figure 4,the proposed method achieves 79.7% precision and 72.6% success rate on OTB100 dataset,which are 4.1%and 2.4%higher than the basic tracker SiamFC,respectively.Eleven tracking methods are compared,and the top ten tracker results are listed in the resulting graph.Compared with other contrast trackers,the precision result of our method is better,the success rate is slightly lower than SRDCF.However,it can be seen from the figure that when the location error is less than 20 and the overlap rate is higher than 60%, the SRDCF method shows better performance than the method in this paper.The main reason is that the position correction of baseline tracker by the BLSiam method is rough correction based on candidate box selection.In addition, the search scope is also limited,so it is inferior to the SRDCF method in accurate position tracking.But the SRDCF method reported a tracking speed of 4FPS, the method in this paper is 45FPS.However,BLSiam plays a certain role in the rough position correction,correcting the tracking drift of the baseline tracker,updating the position and promoting the follow-up tracking of the baseline tracker.And the comprehensive performance is good.The method in this paper makes full use of the advantages of fast training of BLS to carry out online learning and updating and can use fewer samples to obtain the features of tracking sequence online.The drift of basic tracker is corrected byonline evaluation of candidate boxes,which improves the accuracy of tracking to acertain extent without affecting the real-time performance.This provides a new idea for the application of BLS in target tracking.The tracking speed is shown in Table 3.

T A B L E 3 Comparison of average tracking speed between different methods on the OTB100 dataset

The algorithms in Table 3 are all run on the same computer (See Section 4.1).It can be seen from the table that the average tracking speed of the proposed method on OTB100 is 2FPS lower than that of SiamFC.It can be seen that the online learning and updating of the method in this paper have little impact on the speed of baseline tracker, it reflects the rapidity of BLS.In addition, it can be seen from the table that the average tracking speed of the proposed method is faster than that of CNT and DLT methods based on deep learning.This further demonstrates the effectiveness of the proposed method.

Figure 5 lists the tracking results of nine attributes of various tracking methods on the OTB100 dataset, which are low resolution, out of view, in-plane rotation, etc.It can be seen from the figure that except for the three attributes of deformation, occlusion and illumination variation, which the tracking result of our method is slightly lower than that of SRDCF method, the method in this paper has achieved the best results for the other attributes, which are all higher than the baseline tracker SiamFC.It shows that the method in this paper is trained based on positive and negative samples in the tracking process, the number of samples available is small, and the new features learnt are limited, so it is difficult to fully adapt to the appearance changes in the target tracking process.However, the correction effect of the method in this paper can also be seen from the above results.

F I G U R E 5 Tracking results of 9 attributes on the OTB100 dataset

Partial frames tracking results for four challenging video sequences in the OTB100 dataset are listed in Figure 6.The tracking results of our method are highlighted in red.As can be seen from the figure,the method in this paper has corrected the tracking results of SiamFC,and the corrected tracker has a better effect, which can overcome the challenges brought by some occlusion,loss and fast motion in the tracking process and play the role of online update to obtain better tracking results.

4.2.2 | Experimental results and analysis on the UAV123 dataset

The UAV123 dataset contains 123 video sequences, which are shot by UAV.The video scene is large and the background contains many objects.The video set contains tracking of people,objects,cars and ships in multiple scenes.The video has 12 attributes, such as scale variation, camera motion, similar objects,which pose challenges to realize real-time and accurate tracking.This paper runs the tracking results of this dataset on the OTB platform and analyses the tracking results with the OPE-based precision graph and the success rate graph.The specific tracking results are as follows.

As can be seen from Figure 7, the tracking precision and success rate of the proposed method are improved by 8.2%and 6.1% compared with the basic tracker SiamFC.At the same time, the tracking effect is better than other comparison methods.The UAV123 dataset has a large scene and contains many things, which poses a great challenge to tracking.The precision and success rate of the proposed method can reach 79.4%and 66.8%,indicating the effectiveness of the proposed method.The comparison of tracking speed is shown in Table 4.

F I G U R E 6 Partial tracking results on four challenging video sequences of OTB100 dataset

F I G U R E 7 Tracking results on the UAV123 dataset

Figure 9 shows the tracking results of 6 challenging video sequences in the UAV123 dataset by this paper and all comparison methods, namely, boat_6, car_1_2, car4, car11_1,group3_2 and wakeboard6_1.It can be seen from the figure that when the tracked target has viewpoint change, occlusion,background clutter and fast motion, the baseline tracker SiamFC is difficult to overcome the above challenges because it only performs offline training and tracking.However, the method in this paper carries out online learning of the tracked sequence samples, which improves the adaptability of the algorithm to the tracking sequence to a certain extent, thus overcoming some tracking challenges and improving the accuracy of tracking.

T A B L E 4 Comparison of average tracking speed between different methods

F I G U R E 8 Tracking results of 12 attributes on the UAV123 dataset

F I G U R E 9 Partial tracking results on six challenging video sequences of UAV123 dataset

4.2.3 | Experimental results and analysis on the VOT2017 dataset

The VOT2017 dataset contains 60 challenging video sequences, including tracking of small target in multiple similar objects and fast motion tracking of targets in multiple similar objects and similar background.VOT2017 are mainly short-term videos, the sequences have tracking of cars, objects and people, etc.The scenes are relatively complex,including illumination variation, camera motion, scale variation and other attributes, which provide good video sequences support for verifying the effectiveness of the tracking method.Accuracy (A), robustness (R) and expected average overlap (EAO) are three important evaluation indexes in VOT.Accuracy is used to evaluate the accuracy of tracking algorithm.For the t-th frame in the sequence, it is defined as

F I G U R E 1 0 Accuracy-robustness results on the VOT2017 dataset

F I G U R E 1 1 Expected average overlap(EAO)rates on the VOT2017 dataset

whereis the real value of the artificially marked target box andis the result of the tracker.The accuracy of tracking sequence is calculated by calculating the accuracy of each frame in the video and then averaging the accuracy of all frames.Robustness is obtained by calculating the average failure rate,a failure was detected once the overlap measure Equation (13)dropped to zero.The EAO is calculated as the average overlap rate of the tracker over multiple short-term video sequences.

Figure 10 lists the accuracy-robustness tracking results of the proposed method and the comparison method on the VOT2017 dataset.It can be seen from the figure that the proposed method has good accuracy and robustness on the challenging VOT2017 dataset.Figure 11 lists the EAO of the proposed algorithm and the comparison algorithm on VOT2017.As can be seen from the figure,the method in this paper ranks second, inferior to SiamFC.And it can be seen from Figure 10 that the method in this paper has little improvement compared with SiamFC.Mainly because the method in this paper is to carry out rough adjustment of position in the case of inaccurate tracking of baseline method.However, the calculation of the VOT baseline experiment is reinitialised after tracking inaccuracies accumulated to a certain number of frames.Such evaluation leads to the possibility of reinitialisation before the algorithm is fully corrected,but it has little impact on the overall performance of the algorithm.At the same time, it also shows that the method in this paper is suitable for online learning and updating in the long-term tracking process, which enables the tracker to perform effective correction without reinitialisation.

F I G U R E 1 2 Overlap rate graph of unsupervised experiments(average value) on the VOT2017 dataset

T A B L E 5 Comparative results of ablation experiments

The unsupervised method in the VOT test does not know the real tracking result, and the target box is tracked by the given value of the first frame.The following figure lists the overlap rates of the methods in this paper and their comparison methods under unsupervised testing.As can be seen from Figure 12, BLSiam achieves a high overlap rate without supervision, higher than the baseline tracker SiamFC, and ranks first in the comparison tracking algorithm.This shows that the BLSiam can effectively track the target given the real value of the first frame,and exactly illustrates the role of the algorithm in this paper.This further explains why the EAO obtained above under supervision is slightly lower than SiamFC and the accuracy does not improve much.

Table 4 lists the comparison of tracking speed between the proposed method and SiamFC on datasets UAV123 and VOT2017.In this paper, BLS-based re-learning and redetection are mainly added to the baseline tracker, so the impact on baseline tracking is mainly considered.Therefore,the speed comparison between the two methods is carried out.As can be seen from the table,for the UAV123 dataset,the speed of the proposed method is 2 FPS lower than that of SiamFC,and 8 FPS lower than SiamFC on VOT2017 dataset.It can be seen that the proposed method has little impact on the running speed of the baseline tracker, but the accuracy is improved,which reflects the effectiveness of the proposed method.

But the little tailor was not to be daunted3, and said he had set his mind on it and meant to shift for himself, so off he started as though the whole world belonged to him

4.3 | Ablation experiment and analysis

The ablation experiment mainly verifies the validity of the proposed module.The proposed method is based on the baseline tracker SiamFC for online learning and updating based on BLS.The method in this paper(BLSiam)is compared with the baseline method, SiamFC, on datasets OTB100, UAV123, and VOT2017,and the differences and performance are analysed.Table 5 lists the area under the success graph curve(AUC)of the method in this paper and the baseline method SiamFC on the tracking dataset OTB100 and UAV123, as well as the accuracy of the method on the VOT2017 dataset.As can be seen from the table, compared with the SiamFC, the AUC of the proposed method in OTB100 is increased by 2.5%and that in UAV123 is increased by 5.1%.There is a 0.88% increase in accuracy on the VOT2017 dataset.The above increases illustrate the validity of the proposed modules.

5 | CONCLUSION

This paper proposes a Siamese network tracking method based on the BLS.The method combines offline training with online learning and updating and makes the full use of the advantages of SiamFC and BLS.By training the tracking sequence online, the new features in the tracking process can be obtained, and the tracking drift of the baseline tracker in the tracking process can be adjusted, which improves the adaptability of the algorithm and ensures the real-time performance of the algorithm.

The proposed method is compared with other trackers with better tracking performance on three challenging datasets, the accuracy and success rate of the method are better.But since online learning and updates are judged by thresholds, inaccuracies that cannot be corrected can also occur during tracking.This is also where the algorithm needs further improvement in the future.The method in this paper combine online broad learning with deep offline training, which provides a new idea for the application of BLS in target tracking.Further improvement is needed in candidate box classification and regression.

ACKNOWLEDGEMENT

This work is supported in part by the National Natural Science Foundation of China (under Grant Nos.51939001, 61976033,U1813203, 61803064, and 61751202), Natural Foundation Guidance Plan Project of Liaoning(2019-ZD-0151),Science&Technology Innovation Funds of Dalian (under Grant No.2018J11CY022), and Fundamental Research Funds for the Central Universities (under Grant No.3132019345), Dalian High-level Talents Innovation Support Program (Young Science and Technology Star Project) (under Grant No.2021RQ067).

CONFLICT OF INTEREST

The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.

DATA AVAILABILITY STATEMENTResearch data are not shared.

ORCID

Dan Zhanghttps://orcid.org/0000-0002-5788-8851

CAAI Transactions on Intelligence Technology

2023年3期