查看论文信息

查看全文

免费浏览

查看论文信息

中文题名：	基于深度学习的近地面稻麦生育期识别模型构建研究
姓名：	蔡宇澄
学号：	2022101187
保密级别：	公开
论文语种：	chi
学科代码：	0901Z1
学科名称：	农学 - 作物学 - 农业信息学
学生类型：	硕士
学位：	农学硕士
学校：	南京农业大学
院系：	农学院
专业：	农业信息学
研究方向：	农业大数据
第一导师姓名：	张小虎
第一导师单位：	南京农业大学
完成日期：	2025-04-24
答辩日期：	2025-05-26
外文题名：	Research on Constructing Near-surface Rice and Wheat Phenology Detection Model Based on Deep Learning
中文关键词：	稻麦生育期识别 ; 深度学习 ; 时空特征 ; 知识蒸馏
外文关键词：	Rice and wheat phenology detection ; Deep learning ; Spatiotemporal features ; Knowledge distillation
中文摘要：	︿小麦和水稻作为中国的重要粮食作物，其高产稳产是我国粮食安全的重要保障。准确地监测稻麦生育期能够优化农田管理、预测产量和收获时间、进行病虫害防控和调整种植结构，对提高种植效益具有重要意义。传统的稻麦生育期监测多依赖人工田间实际调查，不仅耗费人力，并且受主观影响偏差较大。随着作物生长监测平台和智能算法的发展，利用卫星、无人机等手段获取稻麦生育期的方法得到了广泛研究，但是受限于时间分辨率，这类方法实用性不足。其中缺乏基于近地面地基平台进行稻麦生育期识别的优秀方法成为限制关键因素。近年来，伴随着深度学习技术的引入，基于地基平台的稻麦生育期识别迎来了新的契机。深度学习模型能够从图像中自动提取叶片形态、茎秆高度及穗发育等关键特征，以此精准反映小麦和水稻在不同生育阶段的生长变化。稻麦生育期识别本质上是细粒度图像分类，目前由于缺乏对空间结构特征和时间序列特征进行有效融合的探索，仅采用单张图像进行深度学习模型训练导致模型识别精度较低。也缺乏对模型实时性部署和可视化分析的研究。基于上述问题，本研究以“数据获取——模型构建——模型优化”为主线，围绕稻麦全生育期识别中的时空特征融合及知识蒸馏关键问题，开展基于近地面影像的稻麦生育期识别研究。本研究主要工作如下：（1）构建了一种基于近地面摄像头图像序列和时空特征融合的稻麦生育期识别模型。本研究以多时相图像序列数据集代替单时相图像数据集构建出基于时间序列特征的稻麦全生育期数据集。同时设计了三种不同的时空特征融合方式：顺序融合、同步融合和并行融合，构建了三种不同的模型架构，以探索空间结构特征和时间序列特征有效的融合，从而优化稻麦生育期识别模型。研究结果表明，构建的顺序融合架构能够有效地完成稻麦生育期的识别任务。在小麦数据集上，顺序融合方法总体精度为0.935，平均绝对误差为0.069，F1分数为0.936，Kappa系数为0.924。在水稻数据集上，顺序融合方法总体精度为0.931，平均绝对误差为0.070，F1分数为0.930，Kappa系数为0.920。相较于标准残差网络总体精度分别提升5.3%（小麦）和8.2%（水稻）。并且该方法在小麦和水稻的成熟期识别性能最优，F1分数分别达到0.995和0.985。而在小麦开花期与水稻孕穗期的识别效果相对较低，F1分数分别为0.902和0.891。基于图像序列和时空特征融合的稻麦生育期识别模型扩展了深度学习模型在综合利用时空特征时的适用性，提高了近地面图像中稻麦生育期的识别精度。（2）提出了一种基于知识蒸馏和注意力迁移的稻麦生育期识别模型优化方法。为了进一步解决多时相图像序列引入所导致的模型参数冗余、推理复杂以及无法实时部署等问题。本研究将上一章构建的顺序融合模型作为教师模型，结合软标签、硬标签和教师模型中间层注意力特征图完成对学生模型ResNet50的知识蒸馏训练，最终得到基于单张图像输入的稻麦生育期识别模型。同时通过对模型中间各层注意力特征图的可视化分析，深入揭示稻麦生长过程中的关键特征与动态变化，解决深度学习模型的黑盒缺陷。实验结果表明，结合知识蒸馏训练，组合教师模型第1，3，4层注意力特征图的结果最佳。所提出的方法在小麦和水稻数据集上的总体精度分别达到0.927和0.921，仅较教师模型分别下降0.8%和1.0%。但所需输入的数据由复杂的图像序列简化为单张图像，降低了模型对输入数据的要求。注意力特征图的可视化揭示了深度学习模型的内部机制：第一层主要关注小麦和水稻图像中的浅层纹理信息，第二层聚焦于背景区域，而第三和第四层则逐步提取图像中的高阶语义特征。浅层与深层特征的有效结合对稻麦生育期识别精度的提升具有重要意义。最终，该方法在第二年小麦和水稻的未见数据集上的总体精度为0.917和0.905，验证了所构建模型在实际大田环境下的实用性与泛化能力。﹀
外文摘要：	︿ As staple grain crops in China, wheat and rice play a crucial role in ensuring food security. Accurate monitoring of their phenological stages is essential for optimizing field management, predicting yield and harvest times, controlling pests and diseases, and adjusting planting structures, thereby improving agricultural efficiency. Traditional methods for monitoring rice and wheat phenology primarily rely on manual field surveys, which are labor-intensive and prone to subjective biases. The development of crop growth monitoring platforms and intelligent algorithms has led to the investigation of various methods for obtaining rice and wheat phenological stage information, including satellites and unmanned aerial vehicles. However, due to their limited temporal resolution, these methods have practical constraints for real-time agricultural applications. A key challenge in this field is the lack of effective methods for phenology detection using near-surface platforms. In recent years, the application of deep learning has presented new opportunities for near-surface phenology detection in wheat and rice. Deep learning models can automatically extract key features such as leaf morphology, stalk height and spike development from images to accurately reflect the growth dynamics of wheat and rice at different phenological stages. Rice and wheat phenology detection is essentially a fine-grained image classification task. Currently, due to the limited exploration of effective spatiotemporal feature fusion, deep learning models are trained using only single images, leading to relatively low overall accuracy. There is also a lack of research on real-time deployment of the model and visualization analysis. To solve these problems, this study follows a structured approach of “data acquisition, model construction, and model optimization”, focusing on the critical scientific problems of spatiotemporal feature fusion and knowledge distillation in rice and wheat phenology detection, and carrying out research on rice and wheat phenology detection based on near-surface camera images in the field. The main work of this study is as follows: (1) A rice and wheat phenology detection model based on near-surface camera image series and spatiotemporal feature fusion is constructed. In this study, we construct a rice and wheat phenology dataset based on time-series features by replacing the single-temporal image dataset with a multi-temporal image series dataset. Three different spatiotemporal feature fusion methods: sequential fusion, synchronous fusion and parallel fusion are also designed, and three different model architectures are constructed to explore the effective fusion of spatial structural features and time series features for optimizing the rice and wheat phenology detection model. The results show that the constructed sequential fusion architecture can effectively fulfill the task of rice and wheat phenology detection. On the wheat dataset, the overall accuracy of the sequential fusion is 0.935, with a mean absolute error of 0.069, an F1 score of 0.936, and a Kappa coefficient of 0.924. On the rice dataset, the overall accuracy of the sequential fusion is 0.931, with a mean absolute error of 0.070, an F1 score of 0.930, and a Kappa coefficient of 0.920. Compared with the standard residual network overall accuracy is improved by 5.3% (wheat) and 8.2% (rice), respectively. Furthermore, the proposed method achieves the highest performance during the maturity stage of both wheat and rice, with F1-scores of 0.995 and 0.985, respectively. Conversely, its performance is comparatively diminished during the anthesis stage of wheat and the booting stage of rice, with corresponding F1-scores of 0.902 and 0.891, respectively. The rice and wheat phenology detection model based on image series and spatiotemporal feature fusion extends the applicability of the deep learning model in the comprehensive use of spatiotemporal feature, and improves the accuracy of rice and wheat phenology detection in near-surface images. (2) An optimization method for rice and wheat phenology detection model based on knowledge distillation and attention transfer is proposed. To further address the issues of model parameter redundancy, complex inference and real-time deployment constraints caused by the introduction of multi-temporal image series. In this study, the sequential fusion model obtained in the previous chapter is used as the teacher model, and the knowledge distillation training of the student model ResNet50 is completed by combining the soft label, hard label, and the attention maps of the middle layers of the teacher model, which finally results in the rice and wheat phenology detection model based on the input of a single image. At the same time, the attention maps of the middle layers of the model are visualized to provide a deeper understanding and explanation of the key features and changes in the crop growth process, and to mitigate the black-box defects of the deep learning model. The experimental results demonstrate that, with knowledge distillation training, the combination of attention feature maps from the 1st, 3rd, and 4th layers of the teacher model yields the best performance. The proposed method achieves overall accuracies of 0.927 and 0.921 on the wheat and rice datasets, respectively, with only a slight decrease of 0.8% and 1.0% compared to the teacher model. However, the requisite input is simplified from complex image series to a single image, thereby reducing the model's dependency on extensive data input. The visualization of the attention feature maps provides insight into the internal mechanisms of the deep learning model. The first layer is primarily responsible for processing low-level texture information in wheat and rice images. The second layer focuses on background regions, while the third and fourth layers progressively extract higher-level semantic features. The effective integration of shallow and deep features has been shown to significantly contribute to improved phenological stage detection. The method attained overall accuracies of 0.917 and 0.905 on the second-year unseen wheat and rice datasets, respectively, thereby validating its practicality and generalization capability under real-field conditions. ﹀
中图分类号：	S51
开放日期：	2025-06-10

附件下载