[1910.11102] Multi-View Features and Hybrid Reward Strategies for Vatex Video Captioning Challenge 2019
Since we used all the data sets including the validation set in some models, we only listed the results of the test set
Abstract: This document describes our solution for the VATEX Captioning Challenge 2019,
which requires generating descriptions for the videos in both English and
Chinese languages. We identified three crucial factors that improve the
performance, namely: multi-view features, hybrid reward, and diverse ensemble.
Our method achieves the 2nd and the 3rd places on the Chinese and English video
captioning tracks, respectively.
‹Figure 1. Example of captions from our baseline model and the ground-truth captions. (Introduction)›