基于SegNet语义模型的高分辨率遥感影像农村建设用地提取

2019-04-26杨建宇周振旭杜贞容许全全

农业工程学报 2019年5期

杨建宇，周振旭，杜贞容，许全全，尹航，刘瑞

杨建宇1,2，周振旭1，杜贞容1，许全全1，尹航1，刘瑞1

（1. 中国农业大学土地科学与技术学院，北京 100083；2. 国土资源部农用地质量与监控重点实验室，北京 100035）

针对传统分类算法、浅层学习算法不适用于高空间分辨率遥感影像中农村建筑物信息提取的问题，该文以河北省霸州市高空间分辨率遥感影像World View-2为数据源，利用182 064幅128×128像素大小的影像切片为训练样本，选取基于深度卷积神经网络的SegNet图像语义分割算法对遥感影像中的农村建筑物进行提取，并与传统分类算法中的最大似然法（maximum likelihood，ML）和ISO聚类、浅层学习算法中的支持向量机（support vector machine，SVM）和随机森林（random forest，RF）以及深层语义分割算法中的金字塔场景解析网络（pyramid scene parsing network，PSPNet）的试验结果作对比分析。研究结果表明：SegNet不仅能够高效利用高空间分辨率遥感影像中农村建筑物的光谱信息而且还能够充分利用其丰富的空间特征信息，最终形成较好的分类模型，该算法在验证样本中的分类总体精度为96.61%，Kappa系数为0.90，建筑物的1值为0.91，其余5种分类算法的总体精度、Kappa系数、建筑物的1值都分别在94.68%、0.83、0.87以下。该研究可以为高空间分辨率遥感影像农村建设用地提取研究提供参考。

遥感；图像分割；算法；深度学习；SegNet语义分割模型；高空间分辨率遥感影像；农村建设用地提取

0 引言

随着遥感技术的不断发展，高空间分辨率遥感影像的空间信息更加丰富和精细。同时，高空间分辨率遥感影像的复杂性也对遥感影像的分类技术提出了更高的要求。然而，面对高空间分辨率遥感影像中更明显的几何结构和更丰富的纹理特征，如何设计合理的特征体系、选择合适的分类模型从而精准、快速地掌握农村建设用地的数量及其分布状态，对城乡统筹、节约集约用地和实现可持续发展有着重要的意义，同时也对探索深度学习模型在高空间分辨率遥感影像建筑物分类中的应用具有研究意义。

近年来，在建筑物分类领域常用的分类算法有最大似然法（maximum likelihood，ML）[1]、ISO聚类（ISO clustering）[2]、支持向量机（support vector machine，SVM）[3-4]、随机森林（random forest，RF）[5]、神经网络（neural network，NN）[6-7]等分类算法。然而，这些方法对光谱特征较为依赖、对空间特征利用不足，不适用于光谱分辨率较低的高空间分辨率遥感影像。目前，深度学习已在语音识别[8-9]、图像识别[10]、信息检索[11]等领域超越了传统的机器学习算法。而图像语义分割算法对光谱及空间特征较强的提取能力使更多的学者将其引入到遥感影像分类中[12]，目前图像语义分割方法主要有基于非参数转换的数据驱动方法[13-18]、贝叶斯[19]、马尔可夫随机场[20]和条件随机场[21-22]，但这些方法分割效率低、计算量大。Long等[23]提出全卷积神经网络（fully convolutional networks，FCN），该网络丢弃了全连接层，从而提高了分割效率、降低了计算复杂度，是经典的语义分割网络。基于FCN的图像语义分割算法在建筑物提取方面的表现尤为突出，如Zhang等提出影像自适应分割并开发了多级分类器，使建筑提取精度进一步提高[24]；Zhao等利用多尺度影像构建多尺度样本金字塔，充分挖掘了遥感影像中的空间信息[25]。但是，目前用于建筑物提取的图像语义分割模型多为基于切片的网络架构，与基于像素的端到端网络架构相比，这种架构对样本中的特征缺乏整体性理解且效率较低[26]。Badrinarayanan等[27]提出SegNet网络，该网络是基于像素的端到端的网络架构，是对FCN 的优化，沿用了FCN进行图像语义分割的思想，该网络融合了编码-解码结构和跳跃网络的特点，使得模型能够得到更加精确的输出特征图，在训练样本有限的情况下也能得到更加准确的分类结果。

因此，针对以上存在的问题，本文提出利用基于深度学习的SegNet语义分割模型对遥感影像中农村建筑物的光谱与空间特征进行深度分析与自动提取，形成一个完整的处理流程，在最大程度上使模型实现从原始输入到最终输出的自动化。以期为农村地区的建筑物分布模式分析及节约集约用地实施提供技术参考，为提高高空间分辨率遥感影像建筑物分类精度提供参考价值。

1 研究区及数据

1.1 研究区概况

文中选取的研究区位于河北省霸州市，如图1所示，地理位置介于116°15¢—116°40¢E、39°21¢—39°50¢N，东邻天津西青区，西接雄安新区，南依文安县，北靠固安县和永清县两县。霸州市地势低平，自西北向东南缓倾，土地总面积784 km2，其中居民用地及工矿用地150.5 km2，占土地总面积的19.20%；交通用地13.9 km2，占土地总面积的1.77%；水利设施用地3.7 km2，占土地总面积的0.47%。

图1 霸州市位置图

1.2 数据来源

本文所使用的数据主要是河北省霸州市的全域高空间分辨率遥感影像与该地区的土地利用矢量数据。其中，霸州市的全域高空间分辨率遥感影像获取日期为2013年9月26日，类型为World View-2彩色合成图像，其空间分辨率为0.5 m，均含有RGB三个波段；土地利用矢量数据来自2013年土地利用现状变更调查数据库，该数据可以为样本组织，尤其是地物对应标签的标注提供参照，有利于增强样本的客观性、准确性和精确性。

2 研究方法

2.1 SegNet模型

文中所使用的基于深度卷积神经网络的语义分割模型SegNet整体架构如图2所示，该网络模型主要由编码网络（Encoder Network）、解码网络（Decoder Network）和逐像素分类器（Pixel-wise Classification Layer）组成，并且每个卷积层后面都紧跟着批规范化[28]（Batch Normalization）层和ReLU激活函数。

编码网络是将高维向量转换成低维向量，实现了对高维特征的低维提取。编码网络通过多次最大池化操作虽然可以捕捉更多的平移不变性特征，但同样会丢失更多特征图的边界信息等分割的重要依据。因此，在池化过程中同时记录最大池化索引信息，保存了最大特征值所在的位置，然后利用最大池化索引信息对输入特征图进行上采，使得边界信息得以保存。

解码网络利用编码器下采样时保存的相应特征层的最大池化索引信息将低分辨率的特征图映射到高空间分辨率的特征图，实现了低维向量到高维向量的重构。在解码过程中重复使用最大池化索引具有几个优点：优化边界轮廓描述；减少参数数量，可以端对端训练；上采样方式可以应用到任何编码—解码的网络中[29-33]。在最后一层解码器输出高维特征表示向量，作为可训练Softmax分类器的输入。

图2 SegNet结构图

Softmax分类器单独地对每个像素进行分类，其输出的是每个像素属于各分类的概率。每个像素具有最大概率的分类即为其预测分割的分类。

2.2 SegNet样本组织方法

本文试验中样本集包括训练样本、测试样本和验证样本。从河北省霸州市的全域遥感影像中分别截取了1幅3 000×3 000像素和2幅20 00×2 000像素大小的影像切片作为训练样本，1幅3 000×3 000像素大小的影像切片作为验证样本，数据样本的选取位置如图3所示。

图3 训练样本和验证样本的选取位置

本文是对农村建设用地进行提取，然而农村建筑物在高空间分辨率遥感影像中由于细节的充分展现，导致其光谱特征复杂多变，并且类内光谱差异大，类间光谱差异小，给农村建筑物的提取增加了难度，所以本文在选取样本过程中充分考虑农村建筑物的光谱特征，选取光谱特征覆盖范围广的区域作为样本区域，使选取的样本具有代表性，以避免过拟合现象的发生，增强模型的泛化能力。同时，又有研究表明基于深度卷积神经网络的语义分割模型对二分类中正负样本的平衡度非常敏感，平衡的数据集可以很大程度上提高分类器的分类性能[34-35]。因为在实际选取样本的过程中，很难做到正负样本的平衡，所以本文通过少数类样本过抽样的策略来解决正负样本的平衡问题。

由于截取的训练样本尺寸较大，并且软硬件计算能力有限，所以不能直接输入到网络中进行训练，因此在语义分割模型训练前需要对训练样本进行切割。然而，在高空间分辨率遥感影像中，农村房屋建筑呈现为紧凑的矩形形状，农村硬化道路呈现为规则的条状。当切割尺度过小时，会破坏农村建筑物的空间结构特征，降低模型的泛化能力，影响分类的准确性。当切割尺度过大时，虽然能够完整地保留农村建筑物的空间结构特征，但是训练网络模型时需要消耗大量的内存、显存和时间，并且切割尺度越大，样本数据量就越小，训练过程中越易出现过拟合现象。所以，样本切割尺度过大或过小都将影响语义分割模型训练的效率和最终的分类效果[36]，通过基于多尺度样本语义分割的高空间分辨率遥感影像分类试验来探索不同地物的最佳分类尺度，根据试验结果如图4所示，当对影像切割大小为128´128像素、步长为32像素时，建筑物的分类精度能够达到最高为85.19%。由于深度语义分割模型的训练需要大量的训练集来防止训练过程中过拟合问题的发生，所以本文采用步长为32像素的重叠切割来扩充数据量达到数据增强的作用。

验证样本的组织策略与训练样本一致，不仅要考虑建筑物的空间特征和光谱特征，还要对其进行重叠切割，此处的重叠切割可以减少分类过程中的拼接痕迹，提高分类效果。

注：64×64×16表示按照大小为64像素×64像素，步长为16像素进行样本图像切割。

2.3 基于SegNet语义模型的农村建筑物提取流程

图5介绍了基于深度学习的语义分割模型SegNet在提取农村建筑物过程中的主要技术流程。本研究流程可分为样本数据预处理、语义分割模型的训练、遥感影像分类和分割结果对比分析4个阶段。

注：测试集用来优化调整模型参数。

样本数据的预处理主要是对选取的样本数据进行预处理。主要包括样本标签制作、样本的切割和重叠采样、数据格式的转换，使预处理后的样本数据能够输送到图像语义分割网络中，为语义分割模型的训练做数据准备。

模型的训练主要包括训练前的参数设置和训练中的语义分割网络模型的自我优化。进行训练之前需对训练参数进行设置，然后将大量预处理好的训练样本输入到深层网络中，根据深度网络算法对数据进行大量的非线性变换组合，得到高层次的抽象特征并传送到输出层，再计算输出数据与实际数据之间的差异，然后根据此差异对权重矩阵进行优化，使输出数据与实际数据之间的差异能够达到最小，从而达到优化模型的目的。

遥感影像分类是通过训练好的语义分割网络模型对验证样本中的建筑物进行提取。本文选取的验证样本大小为3 000´3 000像素，由于样本过大不能直接输入到模型中，所以在分类前需对验证样本进行切割，切割大小为128´128像素，步长为32像素的重叠切割的策略，此处重叠切割可以保证分类后图像的连续性并减少拼接痕迹。切割后，大量验证样本数据被输入到训练好的模型中进行分类，然后对分类结果与对应的标签进行逐像素匹配计算，最终得出混淆矩阵。

最后对SegNet的提取结果与对比试验的提取结果进行对比。本文中5个对比试验分别使用PSPNet语义分割网络[37]、支持向量机、随机森林、ISO聚类和最大似然法进行建筑物提取。通过计算对比试验的影像分类结果和该影像所对应标签的差异，得出每种分类方法的混淆矩阵。最后对各种分类方法的精度进行比较和分析。

3 试验设计与结果分析

3.1 样本组织

在实际选取样本的过程中，很难控制正负样本的平衡，所以本文通过对少数类样本进行复制的方式来增加少数类样本的数量，进而解决正负样本的平衡问题，使得最终训练样本中像元个数从17 000 000个增加到25 960 000，其中建筑物像元数为13 069 851个，占总像元个数的50.35%；非建筑物像元数为12 890 149个，占总像元个数的49.65%；同时通过该策略又起到了数据增强的作用，扩充了训练样本和验证样本的数量。

样本选取后，为了使样本不仅能够顺利地输送到图像语义分割网络中还要保证切割后的样本能够达到最好的训练效果以提高模型的分类精度，本文采取的切割大小为128´128像素，切割步长为32像素，不仅增加了数据量，还保证了分类的准确性。最后，对切割后形成的样本数据集随机的抽取0.81%作为测试样本，其余为训练样本，最终形成含有182 064幅128´128像素大小的训练样本和含有1 483幅128´128像素大小的测试样本。

3.2 模型参数设置及对比方法参数设置

在训练前要对SegNet语义分割网络模型主要参数进行设置，学习率（learning rate）可控制模型的学习进度，过低会导致模型收敛慢，过高会导致发散，该文将其初始值设为0.01；学习率变化指数（gamma）可以控制学习率变化速率，该文将其值设为0.1；动量参数（momentum）起到加速收敛的作用，该文动量参数设置为0.9；权值衰减值（weight decay）可以调节模型复杂度对损失函数的影响，该文权值衰减设置为0.0005；学习率变化频率（stepsize）的值设为2000；训练批尺寸（trainbatch）、测试批尺寸（testbatch）分别设为25、15；迭代代数（EpochNum）设为10次。

本文中5个对比试验分别使用PSPNet语义分割网络、支持向量机、随机森林、ISO聚类和最大似然法进行建筑物提取。在PSPNet语义分割网络模型训练前需要对其主要参数进行设置，其学习率、学习率变化指数、动量参数、权值衰减值、学习率变化频率、迭代次数分别设为0.01、0.1、0.9、0.00001、2000、20。进行支持向量机方法分类时，设置每个类的最大示例数为500；进行随机森林方法分类时，设置树的最大数量为100，树最大深度为30；进行ISO聚类分时，聚类类别个数设置为2。

3.3 精度评价指标

分类模型训练结束之后需要判断其分类性能，尤其对于二分类而言，常用的评价指标有Kappa系数、总体精度（overall accuracy，OA）、查全率（recall）、查准率（precision）、错分率（false discovery rate，FDR）和1值，其中1值又称查全率和查准率的调和平均数，是衡量二分类模型精确度的一种指标；为客观评价分类的精度，本文采用以上6种基于混淆矩阵（confusion matrix）的精度评价指标对农村建筑物识别提取结果进行精度评估。

3.4 结果分析

本文对验证集分类结果如图6所示。

图6 验证集建筑物提取结果对比

从传统分类算法到浅层学习算法再到语义分割算法，其分类后的Kappa系数、总体精度、建筑物的1值不断提升，其中SegNet语义分割算法在对高空间分辨率遥感影像建筑物提取中表现最优，PSPNet语义分割算法次之（表1）。使用SegNet语义分割算法分类后的Kappa系数、总体精度、建筑物的1值分别为0.90、96.61%、0.91，地面真实标签与分类后结果两幅图之间吻合度较好；其余5种分类算法的Kappa系数、总体精度、建筑物的1值都分别在0.83、94.68%、0.87以下，2幅图之间吻合度较差。并且SegNet语义分割算法对建筑物的错分率最低，仅为9.71%，说明该算法在高空间分辨率遥感影像中对建筑物的识别能力均优于其余5种算法。

表1 不同分割方法验证集分类结果对比

传统的基于像元光谱统计特征进行分类的算法如ISO聚类、最大似然法，由于“同物异谱、异物同谱”等现象的存在并且没有利用到影像中像元之间的关系等丰富的空间信息，所以在本文高空间分辨率遥感影像农村建筑物提取中精度比较低。在ISO聚类分类结果中，总体精度为90.66%，Kappa系数仅为0.65，建筑物的1值为0.71，建筑物和裸露地表之间出现了“异物同谱”现象，导致建筑物漏分现象比较严重（图7a），建筑物的查全率仅为60.59%，所以ISO在高空间分辨率影像中对建筑物的识别能力比较差；在最大似然法分类结果中，总体精度为83.81%，Kappa系数为0.56，建筑物的1值为0.66，同样因为建筑物与裸露地表、水体、阴影之间出现了“异物同谱”现象，导致裸露地表、水体、阴影等非建筑物错分为建筑物的现象非常明显（图7b），建筑物的错分率高达45.67%，在5种方法对比试验中分类效果最差。

基于机器学习的浅层学习算法如支持向量机、随机森林，不仅利用了影像中像元的光谱信息，还结合了像元之间的关系等丰富的空间信息。但是，由于计算单元有限并且高空间分辨率遥感影像数据量大、地物特征复杂多样，导致其不能有效地表达复杂的地物特征。在随机森林分类结果中，总体精度为90.16%，Kappa系数为0.72，建筑物的1值为0.78，有少量裸露地被错分为建筑物的现象（图7c），但水体和阴影被分为错分为建筑物的现象几乎不存在，相比传统的分类算法，随机森林在高空间分辨率遥感影像建筑物提取中分类效果有很大提高，错分率比最大似然法低12.85个百分点；在支持向量机分类结果中，总体精度为90.87%，Kappa系数为0.74，建筑物的1值为0.79，其分类效果和随机森林基本一致，无明显差异（图7d）。所以在高空间分辨率影像建筑物提取中，基于机器学习的浅层学习算法相比于单纯依靠光谱统计特征的分类算法有很大提高。

注：左图框表示被错分的地物，右图黑框表示错分结果。

Note: The frame on the left shows the surface classified incorrectly, and the frame on the right shows result classified incorrectly.

图7 分类结果细节展示

Fig.7 Classified details display

基于深度卷积神经网络的语义分割模型SegNet训练期间，由于该模型的解码器是上采样与卷积的过程，有13个卷积层，并且只对它们对应的特征映射进行卷积，使之减少了训练参数，节省了计算资源。如图8所示，SegNet相比PSPNet，SegNet网络更稳定，随着迭代次数增加，损失函数快速下降并逐渐趋于平稳，精度快速提升并趋于平稳，收敛速度更快，最终形成适合模式分类的较理想特征，从而增强了模型的收敛和泛化能力、提高模型的分类精度。所以，在SegNet分类结果中，总体精度为96.61%，Kappa系数为0.90，建筑物的1值为0.91，如图6g所示，分类后效果图和地面真实标签几乎一致，吻合度极高，建筑物的错分率最低仅为9.71%。在PSPNet分类结果中，总体精度为94.68%，Kappa系数为0.83，建筑物的1值为0.87，如图6h所示，分类后建筑物边界比较模糊，建筑物错分率为18.89%，比SegNet的错分率高出9.72个百分点。与基于机器学习的浅层学习算法相比，基于深度卷积神经网络的语义分割模型SegNet、PSPNet在高空间分辨率遥感影像建筑物提取中又有进一步的提升，但是SegNet在高空间分辨率遥感影像建筑物提取中优势更加明显。

图8 SegNet与PSPNet网络训练对比

4 结论

本文以河北省霸州市高空间分辨率遥感影像World View-2数据为数据源，选取基于深度卷积神经网络的图像语义分割算法SegNet对高空间分辨率遥感影像中的农村建筑物进行提取，并与最大似然法ML、ISO聚类传统分类算法、支持向量机SVM、随机森林浅（RF）层学习算法以及PSPNet基于深度学习的语义分割算法的试验结果作对比。

1）SVM、RF、ML、ISO聚类等算法对高分辨率遥感影像农村建筑物分类精度较低，4种分类算法分类后的Kappa系数分别为0.74、0.72、0.65、0.56，总体精度分别为90.87%、90.16%、90.66%、83.81%。所以以上分类算法不适合于高空间分辨率遥感影像农村建筑物提取。

2）基于深度学习的语义分割算法SegNet、PSPNet在高空间分辨率遥感影像建筑物分类结果中，两者的Kappa系数分别为0.90、0.83，总体精度分别为96.61%、94.68%。并且SegNet在高空间分辨率遥感影像建筑物提取中错分率较低、网络更稳定、收敛速度更快，最终形成适合模式分类的较理想特征，提高模型的分类精度。所以与本文中其他方法相比，SegNet更适合于高空间分辨率遥感影像农村建筑物提取。

基于深度学习的语义分割模型在遥感影像分类领域有着不可估量的潜力，但它毕竟一种新兴的技术方法，本文研究中仍然存在一些不足，例如网络模型的选择、训练的参数设置等都没有完善的理论依据。下一步将重点研究如何根据遥感影像的特点或者分类要素选择合适的语义分割模型、设置最优的训练参数。

[1] 刘焕军，杨昊轩，徐梦园，等．基于裸土期多时相遥感影像特征及最大似然法的土壤分类[J]. 农业工程学报，2018，34(14)：132－139. Liu Huanjun, Yang Haoxuan, Xu Mengyuan, et al. Soil classification based on maximum likelihood method and features of multi-temporal remote sensing images in bare soil period[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34(14): 132－139. (in Chinese with English abstract)

[2] Tari G, Jessen L, Kennelly P, et al. Surface mapping of the Milh Kharwah salt diapir to better understand the subsurface petroleum system in the Sab’atayn Basin, onshore Yemen[J]. Arabian Journal of Geosciences，2018, 11(15): 428－438.

[3] 朱海洲，贾银山. 基于支持向量机的遥感图像分类研究[J]. 科学技术与工程，2010，10(15)：3659－3663. Zhu Haizhou, Jia Yinshan. Remote sensing image classification based on support vector machine[J]. Science Technology and Engineering, 2010, 10(15): 3659－3663. (in Chinese with English abstract)

[4] 陈袁. 基于支持向量机的遥感影像分类[J]. 中国科技信息, 2015(17)：21－22. Chen Yuan. Remote sensing image classification based on support vector machine [J]. China Science and Technology Information, 2015(17): 21－22. (in Chinese with English abstract)

[5] 陈元鹏，罗明，彭军还，等. 基于网格搜索随机森林算法的工矿复垦区土地利用分类[J]. 农业工程学报，2017，33(14)：250－257. Chen Yuanpeng, Luo Ming, Peng Junhuan, et al. Classification of land use in industrial and mining reclamation area based grid-search and random forest classifier[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(14): 250－257. (in Chinese with English abstract)

[6] 王崇倡，武文波，张建平. 基于BP神经网络的遥感影像分类方法[J]. 辽宁工程技术大学学报：自然科学版，2009，28(1)：32－35. Wang Chongchang, Wu Wenbo, Zhang Jianping. Remote sensing image classification method based on BP neural network[J]. Journal of Liaoning University of Engineering and Technology: Natural Science Edition, 2009, 28 (1): 32－35. (in Chinese with English abstract)

[7] 都业军，周肃，斯琴其其格，等. 人工神经网络在遥感影像分类中的应用与对比研究[J]. 测绘科学，2010(s1):120－121. Du Yejun, Zhou Su, Sqinqige, et al. Application and comparative study of artificial neural network in remote sensing image classification [J]. Surveying and Mapping Science, 2010 (s1): 120－121. (in Chinese with English abstract)

[8] 尹宝才，王文通，王立春. 深度学习研究综述[J]. 北京工业大学学报，2015，41(1)：48－59. Yin Baocai, Wang Wentong, Wang Lichun. A review of in-depth study[J]. Journal of Beijing University of Technology, 2015, 41(1): 48－59. (in Chinese with English abstract)

[9] 俞栋. 解析深度学习：语音识别实践[M]. 北京：电子工业出版社，2016.

[10] 李卫. 深度学习在图像识别中的研究及应用[D]. 武汉：武汉理工大学，2014. Li Wei. Research and Application of Deep Learning in Image Recognition[D]. Wuhan: Wuhan University of Technology, 2014. (in Chinese with English abstract)

[11] 孙志军，薛磊，许阳明，等. 深度学习研究综述[J]. 计算机应用研究，2012，29(8)：2806－2810. Sun Zhijun, Xue Lei, Xu Yangming, et al. A review of in-depth study[J]. Computer Applied Research, 2012, 29 (8): 2806－2810. (in Chinese with English abstract)

[12] Hu F, Xia G S, Hu J W, et al. Transferring deep convolutional neural networks for the scene, classification of high-resolutionremote sensing imagery[J]. Remote Sensing, 2015, 7(11): 14680－14707.

[13] Liu C, Yuen J, Torralba A, et al. Sift flow: Dense correspondence across different scenes[C]//European conference on computer vision. Springer, Berlin, Heidelberg, 2008: 28－42.

[14] Liu C , Yuen J , Torralba A . Nonparametric scene parsing: Label transfer via dense scene alignment[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009: 1972－1979.

[15] Tighe J, Lazebnik S. Superparsing: Scalable nonparametric image parsing with superpixels[C]// European conference on computer vision. Springer, Berlin, Heidelberg, 2010: 352－365.

[16] Eigen D, Fergus R. Nonparametric image parsing using adaptive neighbor sets[C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012: 2799－2806.

[17] Singh G, Kosecka J. Nonparametric scene parsing with adaptive feature relevance and semantic context[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013: 3151－3157.

[18] Yang J, Price B, Cohen S, et al. Context driven scene parsing with attention to rare classes[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 3294－3301.

[19] Feng X, Williams C K I, Felderhof S N. Combining belief networks and neural networks for scene segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(4): 467－483.

[20] Kumar S, Hebert M. Man-made structure detection in natural images using a causal multiscale random field[C]// 2003 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2003: 119.

[21] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3431－3440.

[22] Sultani W, Mokhtari S, Yun H B. Automatic pavement object detection using superpixel segmentation combined with conditional random field[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(7): 2076－2085.

[23] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3431－3440.

[24] Zhang X, Du S. Learning selfhood scales for urban land cover mapping with very-high-resolution satellite images[J]. Remote Sensing of Environment, 2016, 178: 172－190.

[25] Zhao W, Du S. Learning multiscale and deep representations for classifying remotely sensed imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 113: 155－165.

[26] Volpi M, Tuia D. Dense semantic labeling of subdecimeter resolution images with convolutional neural networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(2): 881－893.

[27] Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for scene segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39: 2481－2495.

[28] Ioffe S, Szegedy C. Batch Normalization: Accelerating deep network training by reducing internal covariate shift[C]// International Conference on International Conference on Machine Learning. 2015.

[29] Zheng S, Jayasumana S, Romera-Paredes B, et al. Conditional random fields as recurrent neural networks[C]// Proceedings of the IEEE International Conference on Computer Vision. 2015: 1529－1537.

[30] Badrinarayanan V, Handa A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling[J]. Computer Science, 2015.

[31] Eigen D, Fergus R. Predicting depth，surface normals and semantic labels with a common multi-scale convolutional architecture[C]// IEEE International Conference on Computer Vision, 2015: 2650－2658.

[32] Chen L C, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. Computer Science, 2014(4): 357－361.

[33] Long J, Shelhamer E，Darrell T. Fully convolutional net- works for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015: 3431－3440.

[34] Maxwell A E, Warner T A, Fang F. Implementation of machine-learning classification in remote sensing: An applied review[J]. International Journal of Remote Sensing, 2018, 39(9): 2784－2817.

[35] Dalponte M, Orka H O, Gobakken T, et al. Tree species classification in boreal forests with hyperspectral data[J]. IEEE Transactions on Geoscience and Remote Sensing, 2013, 51(5): 2632－2645.

[36] Du P, Samat A, Waske B, et al. Random forest and rotation forest for fully polarized SAR image classification using polarimetric and spatial features[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2015, 105: 38－53.

[37] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2017: 2881－2890.

Rural construction land extraction from high spatial resolution remote sensing image based on SegNet semantic segmentation model

Yang Jianyu1,2, Zhou Zhenxu1, Du Zhenrong1, Xu Quanquan1, Yin Hang1, Liu Rui1

(1.100083,; 2.100035,)

With the advancement of remote sensing technology, the high spatial resolution remote sensing image contains rich special information with a great detail. At the same time, the complexity of high spatial resolution remote sensing images also requires higher the classification technology of remote sensing images. However, in the face of high spatial resolution remote sensing image more obvious geometrical structure and the more rich texture characteristics, how to design rational system of characteristics, select the appropriate sorting algorithms to accurately and quickly grasp the number of rural land of building and its distribution status, are of great significance to balance urban and rural areas, save land, and realize sustainable development. This will help in exploring the application of deep learning model in high spatial resolution remote sensing image building extraction, and have research significance for improving the classification accuracy of high resolution remote sensing image. In this paper, the semantic segmentation model (SegNet) was used for extracting buildings. SegNet is mainly composed of encoder network, decoder network and pixel-wise classification layer. The encoder network transforms high-dimensional vectors into low-dimensional vectors, enabling low-dimensional extraction of high-dimensional features. The decoder network maps low-resolution feature maps to high spatial resolution feature maps, realizing the reconstruction of low-dimensional vectors to high-dimensional vectors. The softmax classifier separately classifies each pixel, which outputs the probability that each pixel belongs to each class. In this paper, a 3000 pixel × 3000 pixel and two 2000 pixel × 2000 pixel slices were taken from the global remote sensing image of Bazhou City, Hebei Province as training samples, and a 3000 pixel × 3000 pixel slice was taken as the verification sample. In this paper, five comparative experiments were used to extract the buildings, including PSPNet, support vector machine, random forest, ISO clustering and maximum likelihood method. The confusion matrix of each classification method was obtained by calculating the difference between the classification results of the comparison experiment and the real value. From the traditional classification algorithm to the shallow learning algorithm to the deep learning algorithm, the Kappa coefficient and overall accuracy of classification kept constantly increasing, among which SegNet semantic segmentation algorithm based on the deep convolutional network performed better than the other five algorithms in extracting buildings from high spatial resolution remote sensing image. The Kappa coefficient and the overall accuracy of SegNet semantic segmentation algorithm were 0.90 and 96.61%, respectively, and the ground truth value was basically the same as the classification result. The F1Score of building extraction of SegNet semantic segmentation algorithm based on deep convolution network was 0.91, but the other five algorithms were below 0.87. SegNet had the lowest error rate of 9.71% for buildings, indicating that the ability to identify buildings of semantic segmentation algorithm from high spatial resolution remote sensing was superior to traditional classification algorithms, shallow layer learning algorithms based on machine learning, and PSPNet semantic segmentation algorithm based on deep convolution network. The Kappa coefficient and overall accuracy of the remaining five classification algorithms were respectively below 0.83 and 94.68%, and the difference between the ground truth value and the classification result was relatively large. SegNet can not only make use of spectral information but also make full use of abundant spatial information. During SegNet training, more essential features can be learned, and more ideal features suitable for pattern classification were finally formed, which can enhance the ability of convergence and generalization of the model and improve the classification accuracy. Traditional classification algorithms, such as ISO clustering and maximum likelihood method, failed to make use of the rich spatial information of the high-resolution remote sensing image, so the accuracy was relatively low. Due to limited computing units and large amount of high spatial resolution remote sensing image data, shallow layer learning algorithms based on machine learning such as support vector machines and random forest cannot effectively express complex features of ground objects, so their advantages are not obvious in building extraction from the high spatial resolution remote sensing images.The experimental results showed that the SegNet based on deep learning has the best performance, and it has important theoretical significance to explore the application of deep learning model to remote sensing image classification methods. At the same time, the research results also provide a reference for improving the classification accuracy of high resolution remote sensing images.

remote sensing; image segmentation; algorithms; deep learning; SegNet semantic segmentation model; high-resolution remote sensing image; rural construction land extraction

2018-11-12

2019-02-06

国土资源部公益性行业科研专项（201511010-06）

杨建宇，男，湖北宜昌人，副院长，教授，博士生导师，主要从事3S技术及其土地应用的研究。Email：ycjyyang@cau.edu.cn

10.11975/j.issn.1002-6819.2019.05.031

S127

1002-6819(2019)-05-0251-08

杨建宇，周振旭，杜贞容，许全全，尹航，刘瑞. 基于SegNet语义模型的高分辨率遥感影像农村建设用地提取[J]. 农业工程学报，2019，35(5)：251－258.doi：10.11975/j.issn.1002-6819.2019.05.031 http://www.tcsae.org

Yang Jianyu, Zhou Zhenxu, Du Zhenrong, Xu Quanquan, Yin Hang, Liu Rui. Rural construction land extraction from high spatial resolution remote sensing image based on SegNet semantic segmentation model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(5): 251－258. (in Chinese with English abstract) doi：10.11975/j.issn.1002-6819.2019.05.031 http://www.tcsae.org