Name: Google Vision Transformer
Brand: Google Vision Transformer
Availability: InStock

Product Details

Google Vision Transformer is an image recognition model based on the Transformer encoder. It is pre-trained using large-scale image data and can be used for tasks such as image classification. The model was pre-trained on the ImageNet-21k dataset and fine-tuned on the ImageNet dataset, and has good image feature extraction capabilities. This model processes image data by splitting the image into fixed-size blocks and linearly embedding these blocks. At the same time, the model adds positional encoding before the input sequence to process the sequence data in the Transformer encoder. Users can perform tasks such as image classification by adding linear layers on top of pre-trained encoders. The advantage of Google Vision Transformer lies in its powerful image feature learning capabilities and wide applicability. This model is free to use.

Main Features

1

Image feature extraction based on Transformer

2

Support tasks such as image classification

3

Pretrained models can be used for transfer learning

4

Suitable for large-scale image data

Target Users

Suitable for scenarios such as image classification, target detection and image segmentation

Google Vision Transformer

Product Details

Main Features

Target Users

Quick Access

Categories