logo
banner

Journals & Publications

Publications Papers

Papers

Aggregating Rich Hierarchical Features for Scene Classification in Remote Sensing Imagery
Oct 30, 2017Author:
PrintText Size A A

Title: Aggregating Rich Hierarchical Features for Scene Classification in Remote Sensing Imagery

 Authors: Wang, GL; Fan, B; Xiang, SM; Pan, CH

 Author Full Names: Wang, Guoli; Fan, Bin; Xiang, Shiming; Pan, Chunhong

 Source: IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 10 (9):4104-4115; SI 10.1109/JSTARS.2017.2705419 SEP 2017

 Language: English

 Abstract: Scene classification is one of the most important issues in remote sensing image processing. To obtain a high discriminative feature representation for an image to be classified, traditional methods usually consider to densely accumulate hand-crafted low-level descriptors (e.g., scale-invariant feature transform) by feature encoding techniques. However, the performance is largely limited by the hand-crafted descriptors as they are not capable of describing the rich semantic information contained in various remote sensing images. To alleviate this problem, we propose a novel method to extract discriminative image features from the rich hierarchical information contained in convolutional neural networks (CNNs). Specifically, the low-level and middle-level intermediate convolutional features are, respectively, encoded by vector of locally aggregated descriptors (VLAD) and then reduced by principal component analysis to obtain hierarchical global features; meanwhile, the fully connected features are average pooled and subsequently normalized to form new global features. The proposed encoded mixed-resolution representation (EMR) is the concatenation of all the above-mentioned global features. Due to the usage of encoding strategies (VLAD and average pooling), our method can deal with images of different sizes. In addition, to reduce the computational consumption in the training stage, we directly extract EMR from VGG-VD and ResNet pretrained on the ImageNet dataset. We show in this paper that CNNs pretrained on the natural image dataset are more easily applied to the remote sensing dataset when the local structure similarity between two datasets is higher. Experimental evaluations on the UC-Merced and Brazilian Coffee Scenes datasets demonstrate that our method is superior to the state of the art.

 ISSN: 1939-1404

 eISSN: 2151-1535

 IDS Number: FJ3IU

 Unique ID: WOS:000412626400025

*Click Here to View Full Record