WINSLOW Discover, learn and grow

【论文阅读】An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

论文链接:AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE ViT:把图像看成 patch token 序列,而不是像素网格或卷积特征图,然后直接用标准Transformer Encoder 做全局建

ysf Published on 2023-07-30