Multi-Grained feature aggregation based on Transformer for unsupervised person re-identification
Abstract
Person re-identification aims to retrieve specific person targets across different surveillance cameras. Due to problems such as posture changes, object occlusion, and background interference, the person re-identification effect is poor. A multi-grained feature aggregation unsupervised person re-identification based on Transformer is proposed to make full use of the extracted person features. First, a Dual-Channel Attention module is designed to enable the network to adaptively adjust the receptive field size based on multiple scales of input information, facilitating the capture of connections between different parts of the person's body. This enhances the network's ability to extract person feature information, enabling it to obtain more critical image information and output more representative person expression features. Next, an Explicit Visual Center module is proposed to capture global information and aggregate essential local information, strengthening the network's feature representation and thereby improving the model's generalization capability. Finally, experiments are conducted on popular datasets such as Market1501,DukeMTMC-reID, and MSMT17 to validate the proposed approach. The results demonstrate that the improved model achieves higher performance metrics, yielding greater recognition accuracy and better representation of person features.
DOI: 10.61416/ceai.v26i1.8822