Off-policy self-critical training for transformer in visual paragraph generation