Visual Question Answering : From Theory to Application
Visual Question Answering : From Theory to Application
Click to enlarge
Author(s): Wang, Peng
Wang, Xin
Wu, Qi
ISBN No.: 9789811909665
Pages: xiii, 238
Year: 202305
Format: Trade Paper
Price: $ 151.79
Dispatch delay: Dispatched between 7 to 15 days
Status: Available

1 Introduction . 1 1.1 Motivation . 1 1.2 Visual Question Answering in AI tasks . 4 1.3 Categorisation of VQA . 6 1.


3.1 ClassiFied by Data Settings . 6 1.3.2 ClassiFied by Task Settings . 7 1.3.3 Others .


8 1.4 Book Overview . 8 References . 9 Part I Preliminaries 2 Deep Learning Basics . 15 2.1 Neural Networks . 15 2.2 Convolutional Neural Networks .


17 2.3 Recurrent Neural Networks and variants . 18 2.4 Encoder-Decoder Structure . 20 2.5 Attention Mechanism . 21 2.6 Memory Networks .


21 2.7 Transformer Networks and BERT . 23 2.8 Graph Neural Networks Basics . 24 References . 26 3 Question Answering (QA) Basics . 29 3.1 Rule-based methods .


29 3.2 Information retrieval-based methods . 30 3.3 Neural Semantic Parsing for QA . 31 3.4 Knowledge Base for QA . 31 References . 32 Part II Image-based VQA ix x Contents 4 The Classical Visual Question Answering .


37 4.1 Introduction . 37 4.2 Datasets . 38 4.3 Generation VS. ClassiFication: Two answering policies . 39 4.


4 Joint Embedding Methods . 40 4.4.1 Sequence-to-Sequence Encoder-Decoder Models . 40 4.4.2 Bilinear Encoding for VQA . 42 4.


5 Awesome Attention Mechanisms . 44 4.5.1 Stacked Attention Networks . 44 4.5.2 Hierarchical Question-Image Co-attention . 47 4.


5.3 Bottom-Up and Top-Down Attention . 48 4.6 Memory Networks for VQA . 50 4.6.1 Improved Dynamic Memory Networks . 50 4.


6.2 Memory-Augmented Networks . 52 4.7 Compositional Reasoning for VQA . 54 4.7.1 Neural Modular Networks . 54 4.


7.2 Dynamic Neural Module Networks . 56 4.8 Graph Neural Networks for VQA . 57 4.8.1 Graph Convolutional Networks . 58 4.


8.2 Graph Attention Networks . 60 4.8.3 Graph Convolutional Networks for VQA . 62 4.8.4 Graph Attention Networks for VQA .


63 References . 65 5 Knowledge-based VQA . 69 5.1 Introduction . 69 5.2 Datasets . 70 5.3 Knowledge Bases introduction .


72 5.3.1 DBpedia . 72 5.3.2 ConceptNet . 73 5.4 Knowledge Embedding Methods .


73 5.4.1 Word-to-vector representation . 73 5.4.2 Bert-based representation . 75 5.5 Question-to-Query Translation .


76 5.5.1 Query-mapping based methods . 77 5.5.2 Learning based methods . 78 5.6 How to query knowledge bases .


79 5.6.1 RDF query . 79 5.6.2 Memory Network query . 81 References . 82 6 Vision-and-Language Pre-training for VQA .


87 6.1 Introduction . 87 6.2 General Pre-training Models . 88 6.2.1 Embeddings from Language Models . 88 Contents xi 6.


2.2 Generative Pre-Training Model . 89 6.2.3 Bidirectional Encoder Representations from Transformers . 89 6.3 Popular Vision-and-Language Pre-training Methods . 93 6.


3.1 Single-Stream Methods . 94 6.3.2 Two-Stream Methods . 96 6.4 Fine-tuning on VQA and Other Downstream Tasks . 98 References .


101 Part III Video-based VQA 7 Video Representation Learning . 105 7.1 Hand-crafted local video descriptors . 105 7.2 Data-driven deep learning features for video representation . 108 7.3 Self-supervised learning for video representation . 109 References .


110 8 Video Question Answering . 113 8.1 Introductions . 113 8.2 Datasets . 114 8.2.1 Multi-step reasoning dataset .


114 8.2.2 Single-step reasoning dataset . 118 8.3 Traditional Video Spatio-Temporal Reasoning Using Encoder-Decoder Framework . 119 References . 126 9 Advanced Models for Video Question Answering . 129 9.


1 Attention on Spatio-Temporal Features .


To be able to view the table of contents for this publication then please subscribe by clicking the button below...
To be able to view the full description for this publication then please subscribe by clicking the button below...