Found 1 AI tools
Click any tool to view details
This paper studies the problem of conceptual interpretation of video Transformer representations. Specifically, we try to explain the decision-making process of a video Transformer based on high-level spatiotemporal concepts, which are discovered automatically. Previous research on concept-based interpretability has only focused on image-level tasks. In contrast, video models handle additional temporal dimensions, adding complexity and posing challenges in identifying dynamic concepts that change over time. In this work, we systematically address these challenges by introducing the first Video Transformer Concept Discovery (VTCD) algorithm. To this end, we propose an efficient unsupervised video Transformer representation unit (concept) identification method and rank their importance in the model output. The resulting concepts are highly interpretable, revealing spatiotemporal reasoning mechanisms and object-centric representations in unstructured video models. By jointly performing this analysis on diverse supervised and self-supervised representations, we find that some of these mechanisms are common among video Transformers. Finally, we demonstrate that VTCD can be used to improve model performance on fine-grained tasks.
Explore other subcategories under video Other Categories
399 tools
346 tools
323 tools
181 tools
130 tools
124 tools
64 tools
49 tools
AI academic research Hot video is a popular subcategory under 1 quality AI tools