A review of CNN-based typical urban land cover segmentation techniques in multispectral remote sensing imagery

Compared with visible-light remote sensing, multispectral remote sensing provides multi-band land surface information and enhances spectral separability through data fusion, thereby enabling more accurate surface representation. However, spectral redundancy, resolution discrepancies, and highly comp...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhao, Haimeng, Raihani, Mohamed, Ng, Seng Beng, Mohd, Ismail
Format: Article
Language:en
Published: Penerbit Universiti Kebangsaan Malaysia 2026
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/124131/1/124131.pdf
http://psasir.upm.edu.my/id/eprint/124131/
https://www.ukm.my/jsm/pdf_files/SM-PDF-55-2-2026/3.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Compared with visible-light remote sensing, multispectral remote sensing provides multi-band land surface information and enhances spectral separability through data fusion, thereby enabling more accurate surface representation. However, spectral redundancy, resolution discrepancies, and highly complex urban environments impose greater challenges on existing methods. Deep learning approaches based on convolutional neural network (CNN) offer superior capabilities in extracting and integrating multispectral features, enabling more accurate urban land cover segmentation. This review focuses on pixel-level urban land cover segmentation and systematically summarizes recent advances in deep learning for multispectral remote sensing. First, we emphasize that the rich spectral information and spatial complementarity of multispectral data effectively enhance segmentation performance and alleviate ambiguities caused by the ‘same spectrum-different objects’ and ‘same object-different spectra’. Second, we review 19 publicly available multispectral datasets, highlighting differences in spectral bands, spatial resolution, and application scenarios, and summarize a standardized preprocessing pipeline including radiometric calibration, geometric correction, band normalization, and spectral dimensionality reduction to support reproducibility. Third, we discuss representative spectral-spatial feature extraction and cross-scale context modeling strategies, covering dilated convolution, 3D-2D hybrid structures, dual-branch architectures, and multi-scale enhancement modules. Extensive comparative experiments on ISPRS Potsdam and GID datasets further demonstrate the applicability and performance differences of representative models. Finally, future research trends and directions are discussed, encompassing multi-temporal and multi-scale temporal learning, cross-modal fusion, and the lightweight design of complex models.