Abstract: We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results