

农业工程学报 2019年5期

杨建宇,周振旭,杜贞容,许全全,尹 航,刘 瑞


杨建宇1,2,周振旭1,杜贞容1,许全全1,尹 航1,刘 瑞1

(1. 中国农业大学土地科学与技术学院,北京 100083;2. 国土资源部农用地质量与监控重点实验室,北京 100035)

针对传统分类算法、浅层学习算法不适用于高空间分辨率遥感影像中农村建筑物信息提取的问题,该文以河北省霸州市高空间分辨率遥感影像World View-2为数据源,利用182 064幅128×128像素大小的影像切片为训练样本,选取基于深度卷积神经网络的SegNet图像语义分割算法对遥感影像中的农村建筑物进行提取,并与传统分类算法中的最大似然法(maximum likelihood,ML)和ISO聚类、浅层学习算法中的支持向量机(support vector machine,SVM)和随机森林(random forest,RF)以及深层语义分割算法中的金字塔场景解析网络(pyramid scene parsing network,PSPNet)的试验结果作对比分析。研究结果表明:SegNet不仅能够高效利用高空间分辨率遥感影像中农村建筑物的光谱信息而且还能够充分利用其丰富的空间特征信息,最终形成较好的分类模型,该算法在验证样本中的分类总体精度为96.61%,Kappa系数为0.90,建筑物的1值为0.91,其余5种分类算法的总体精度、Kappa系数、建筑物的1值都分别在94.68%、0.83、0.87以下。该研究可以为高空间分辨率遥感影像农村建设用地提取研究提供参考。


0 引 言


近年来,在建筑物分类领域常用的分类算法有最大似然法(maximum likelihood,ML)[1]、ISO聚类(ISO clustering)[2]、支持向量机(support vector machine,SVM)[3-4]、随机森林(random forest,RF)[5]、神经网络(neural network,NN)[6-7]等分类算法。然而,这些方法对光谱特征较为依赖、对空间特征利用不足,不适用于光谱分辨率较低的高空间分辨率遥感影像。目前,深度学习已在语音识别[8-9]、图像识别[10]、信息检索[11]等领域超越了传统的机器学习算法。而图像语义分割算法对光谱及空间特征较强的提取能力使更多的学者将其引入到遥感影像分类中[12],目前图像语义分割方法主要有基于非参数转换的数据驱动方法[13-18]、贝叶斯[19]、马尔可夫随机场[20]和条件随机场[21-22],但这些方法分割效率低、计算量大。Long等[23]提出全卷积神经网络(fully convolutional networks,FCN),该网络丢弃了全连接层,从而提高了分割效率、降低了计算复杂度,是经典的语义分割网络。基于FCN的图像语义分割算法在建筑物提取方面的表现尤为突出,如Zhang等提出影像自适应分割并开发了多级分类器,使建筑提取精度进一步提高[24];Zhao等利用多尺度影像构建多尺度样本金字塔,充分挖掘了遥感影像中的空间信息[25]。但是,目前用于建筑物提取的图像语义分割模型多为基于切片的网络架构,与基于像素的端到端网络架构相比,这种架构对样本中的特征缺乏整体性理解且效率较低[26]。Badrinarayanan等[27]提出SegNet网络,该网络是基于像素的端到端的网络架构,是对FCN 的优化,沿用了FCN进行图像语义分割的思想,该网络融合了编码-解码结构和跳跃网络的特点,使得模型能够得到更加精确的输出特征图,在训练样本有限的情况下也能得到更加准确的分类结果。


1 研究区及数据

1.1 研究区概况

文中选取的研究区位于河北省霸州市,如图1所示,地理位置介于116°15¢—116°40¢E、39°21¢—39°50¢N,东邻天津西青区,西接雄安新区,南依文安县,北靠固安县和永清县两县。霸州市地势低平,自西北向东南缓倾,土地总面积784 km2,其中居民用地及工矿用地150.5 km2,占土地总面积的19.20%;交通用地13.9 km2,占土地总面积的1.77%;水利设施用地3.7 km2,占土地总面积的0.47%。

图1 霸州市位置图

1.2 数据来源

本文所使用的数据主要是河北省霸州市的全域高空间分辨率遥感影像与该地区的土地利用矢量数据。其中,霸州市的全域高空间分辨率遥感影像获取日期为2013年9月26日,类型为World View-2彩色合成图像,其空间分辨率为0.5 m,均含有RGB三个波段;土地利用矢量数据来自2013年土地利用现状变更调查数据库,该数据可以为样本组织,尤其是地物对应标签的标注提供参照,有利于增强样本的客观性、准确性和精确性。

2 研究方法

2.1 SegNet模型

文中所使用的基于深度卷积神经网络的语义分割模型SegNet整体架构如图2所示,该网络模型主要由编码网络(Encoder Network)、解码网络(Decoder Network)和逐像素分类器(Pixel-wise Classification Layer)组成,并且每个卷积层后面都紧跟着批规范化[28](Batch Normalization)层和ReLU激活函数。



图2 SegNet结构图


2.2 SegNet样本组织方法

本文试验中样本集包括训练样本、测试样本和验证样本。从河北省霸州市的全域遥感影像中分别截取了1幅3 000×3 000像素和2幅20 00×2 000像素大小的影像切片作为训练样本,1幅3 000×3 000像素大小的影像切片作为验证样本,数据样本的选取位置如图3所示。

图3 训练样本和验证样本的选取位置





2.3 基于SegNet语义模型的农村建筑物提取流程





遥感影像分类是通过训练好的语义分割网络模型对验证样本中的建筑物进行提取。本文选取的验证样本大小为3 000´3 000像素,由于样本过大不能直接输入到模型中,所以在分类前需对验证样本进行切割,切割大小为128´128像素,步长为32像素的重叠切割的策略,此处重叠切割可以保证分类后图像的连续性并减少拼接痕迹。切割后,大量验证样本数据被输入到训练好的模型中进行分类,然后对分类结果与对应的标签进行逐像素匹配计算,最终得出混淆矩阵。


3 试验设计与结果分析

3.1 样本组织

在实际选取样本的过程中,很难控制正负样本的平衡,所以本文通过对少数类样本进行复制的方式来增加少数类样本的数量,进而解决正负样本的平衡问题,使得最终训练样本中像元个数从17 000 000个增加到25 960 000,其中建筑物像元数为13 069 851个,占总像元个数的50.35%;非建筑物像元数为12 890 149个,占总像元个数的49.65%;同时通过该策略又起到了数据增强的作用,扩充了训练样本和验证样本的数量。

样本选取后,为了使样本不仅能够顺利地输送到图像语义分割网络中还要保证切割后的样本能够达到最好的训练效果以提高模型的分类精度,本文采取的切割大小为128´128像素,切割步长为32像素,不仅增加了数据量,还保证了分类的准确性。最后,对切割后形成的样本数据集随机的抽取0.81%作为测试样本,其余为训练样本,最终形成含有182 064幅128´128像素大小的训练样本和含有1 483幅128´128像素大小的测试样本。

3.2 模型参数设置及对比方法参数设置

在训练前要对SegNet语义分割网络模型主要参数进行设置,学习率(learning rate)可控制模型的学习进度,过低会导致模型收敛慢,过高会导致发散,该文将其初始值设为0.01;学习率变化指数(gamma)可以控制学习率变化速率,该文将其值设为0.1;动量参数(momentum)起到加速收敛的作用,该文动量参数设置为0.9;权值衰减值(weight decay)可以调节模型复杂度对损失函数的影响,该文权值衰减设置为0.0005;学习率变化频率(stepsize)的值设为2000;训练批尺寸(trainbatch)、测试批尺寸(testbatch)分别设为25、15;迭代代数(EpochNum)设为10次。


3.3 精度评价指标

分类模型训练结束之后需要判断其分类性能,尤其对于二分类而言,常用的评价指标有Kappa系数、总体精度(overall accuracy,OA)、查全率(recall)、查准率(precision)、错分率(false discovery rate,FDR)和1值,其中1值又称查全率和查准率的调和平均数,是衡量二分类模型精确度的一种指标;为客观评价分类的精度,本文采用以上6种基于混淆矩阵(confusion matrix)的精度评价指标对农村建筑物识别提取结果进行精度评估。

3.4 结果分析


图6 验证集建筑物提取结果对比


表1 不同分割方法验证集分类结果对比






图7 分类结果细节展示

Fig.7 Classified details display


图8 SegNet与PSPNet网络训练对比

4 结 论

本文以河北省霸州市高空间分辨率遥感影像World View-2数据为数据源,选取基于深度卷积神经网络的图像语义分割算法SegNet对高空间分辨率遥感影像中的农村建筑物进行提取,并与最大似然法ML、ISO聚类传统分类算法、支持向量机SVM、随机森林浅(RF)层学习算法以及PSPNet基于深度学习的语义分割算法的试验结果作对比。




Rural construction land extraction from high spatial resolution remote sensing image based on SegNet semantic segmentation model

Yang Jianyu1,2, Zhou Zhenxu1, Du Zhenrong1, Xu Quanquan1, Yin Hang1, Liu Rui1

(1.100083,; 2.100035,)

With the advancement of remote sensing technology, the high spatial resolution remote sensing image contains rich special information with a great detail. At the same time, the complexity of high spatial resolution remote sensing images also requires higher the classification technology of remote sensing images. However, in the face of high spatial resolution remote sensing image more obvious geometrical structure and the more rich texture characteristics, how to design rational system of characteristics, select the appropriate sorting algorithms to accurately and quickly grasp the number of rural land of building and its distribution status, are of great significance to balance urban and rural areas, save land, and realize sustainable development. This will help in exploring the application of deep learning model in high spatial resolution remote sensing image building extraction, and have research significance for improving the classification accuracy of high resolution remote sensing image. In this paper, the semantic segmentation model (SegNet) was used for extracting buildings. SegNet is mainly composed of encoder network, decoder network and pixel-wise classification layer. The encoder network transforms high-dimensional vectors into low-dimensional vectors, enabling low-dimensional extraction of high-dimensional features. The decoder network maps low-resolution feature maps to high spatial resolution feature maps, realizing the reconstruction of low-dimensional vectors to high-dimensional vectors. The softmax classifier separately classifies each pixel, which outputs the probability that each pixel belongs to each class. In this paper, a 3000 pixel × 3000 pixel and two 2000 pixel × 2000 pixel slices were taken from the global remote sensing image of Bazhou City, Hebei Province as training samples, and a 3000 pixel × 3000 pixel slice was taken as the verification sample. In this paper, five comparative experiments were used to extract the buildings, including PSPNet, support vector machine, random forest, ISO clustering and maximum likelihood method. The confusion matrix of each classification method was obtained by calculating the difference between the classification results of the comparison experiment and the real value. From the traditional classification algorithm to the shallow learning algorithm to the deep learning algorithm, the Kappa coefficient and overall accuracy of classification kept constantly increasing, among which SegNet semantic segmentation algorithm based on the deep convolutional network performed better than the other five algorithms in extracting buildings from high spatial resolution remote sensing image. The Kappa coefficient and the overall accuracy of SegNet semantic segmentation algorithm were 0.90 and 96.61%, respectively, and the ground truth value was basically the same as the classification result. The F1Score of building extraction of SegNet semantic segmentation algorithm based on deep convolution network was 0.91, but the other five algorithms were below 0.87. SegNet had the lowest error rate of 9.71% for buildings, indicating that the ability to identify buildings of semantic segmentation algorithm from high spatial resolution remote sensing was superior to traditional classification algorithms, shallow layer learning algorithms based on machine learning, and PSPNet semantic segmentation algorithm based on deep convolution network. The Kappa coefficient and overall accuracy of the remaining five classification algorithms were respectively below 0.83 and 94.68%, and the difference between the ground truth value and the classification result was relatively large. SegNet can not only make use of spectral information but also make full use of abundant spatial information. During SegNet training, more essential features can be learned, and more ideal features suitable for pattern classification were finally formed, which can enhance the ability of convergence and generalization of the model and improve the classification accuracy. Traditional classification algorithms, such as ISO clustering and maximum likelihood method, failed to make use of the rich spatial information of the high-resolution remote sensing image, so the accuracy was relatively low. Due to limited computing units and large amount of high spatial resolution remote sensing image data, shallow layer learning algorithms based on machine learning such as support vector machines and random forest cannot effectively express complex features of ground objects, so their advantages are not obvious in building extraction from the high spatial resolution remote sensing images.The experimental results showed that the SegNet based on deep learning has the best performance, and it has important theoretical significance to explore the application of deep learning model to remote sensing image classification methods. At the same time, the research results also provide a reference for improving the classification accuracy of high resolution remote sensing images.

remote sensing; image segmentation; algorithms; deep learning; SegNet semantic segmentation model; high-resolution remote sensing image; rural construction land extraction









杨建宇,周振旭,杜贞容,许全全,尹 航,刘 瑞. 基于SegNet语义模型的高分辨率遥感影像农村建设用地提取[J]. 农业工程学报,2019,35(5):251-258.doi:10.11975/j.issn.1002-6819.2019.05.031

Yang Jianyu, Zhou Zhenxu, Du Zhenrong, Xu Quanquan, Yin Hang, Liu Rui. Rural construction land extraction from high spatial resolution remote sensing image based on SegNet semantic segmentation model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(5): 251-258. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2019.05.031


