Rethinking person re-identification via semantic-based pretraining

摘要

Pretraining is a dominant paradigm in computer vision. Generally, supervised ImageNet pretraining is commonly used to initialize the backbones of person re-identification (Re-ID) models. However, recent works show a surprising result that CNN-based pretraining on ImageNet has limited impacts on Re-ID system due to the large domain gap between ImageNet and person Re-ID data. To seek an alternative to traditional pretraining, here we investigate semantic-based pretraining as another method to utilize additional textual data against ImageNet pretraining. Specifically, we manually construct a diversified FineGPR-C caption dataset for the first time on person Re-ID events. Based on it, a pure semantic-based pretraining approach named VTBR is proposed to adopt dense captions to learn visual representations with fewer images. We train convolutional neural networks from scratch on the captions of FineGPR

出版物
ACM Transactions on Multimedia Computing, Communications and Applications
向孙程
向孙程
2017级博士生
高景盛
高景盛
2020级博士生
刘婷
刘婷
讲师
付宇卓
付宇卓
教授 博士生导师