1
Sim-to-Real Transfer for Vision-and-Language Navigation
2
Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents
3
Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation
4
Hypothesis Sketching for Online Kernel Selection in Continuous Kernel Space
5
Feel The Music: Automatically Generating A Dance For An Input Song
6
Exploring Crowd Co-creation Scenarios for Sketches
7
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
8
Predicting A Creator's Preferences In, and From, Interactive Generative Art
9
SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions
10
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
11
Decentralized Distributed PPO: Solving PointGoal Navigation
12
Improving Generative Visual Dialog by Answering Diverse Questions
13
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
14
Chasing Ghosts: Instruction Following as Bayesian State Tracking
15
Towards VQA Models That Can Read
16
Embodied Question Answering in Photorealistic Environments With Point Cloud Perception
17
Habitat: A Platform for Embodied AI Research
18
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
19
Trick or TReAT : Thematic Reinforcement for Artistic Typography
20
Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future
21
CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
22
Cycle-Consistency for Robust Visual Question Answering
23
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
24
Audio Visual Scene-Aware Dialog
25
Neural Modular Control for Embodied Question Answering
26
Do explanations make VQA models more predictable to a human?
27
Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition
28
Graph R-CNN for Scene Graph Generation
29
End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features
30
CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication
31
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
32
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
33
Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering
34
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
35
C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset
36
Learning to Reason: End-to-End Module Networks for Visual Question Answering
37
An Analysis of Visual Question Answering Algorithms
38
Understanding Black-box Predictions via Influence Functions
39
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
40
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
41
Visual question answering: Datasets, algorithms, and future challenges
42
The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question Answering (FSVQA)
43
Knowing who to listen to: Prioritizing experts from a diverse ensemble for attribute personalization
44
Towards Transparent AI Systems: Interpreting Visual Question Answering Models
45
Focused Evaluation for Image Description with Binary Forced-Choice Tasks
46
Answer-Type Prediction for Visual Question Answering
47
Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions
48
DualNet: Domain-invariant network for visual question answering
49
Training Recurrent Answering Units with Joint Loss Minimization for VQA
50
Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions?
51
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
52
Multimodal Residual Learning for Visual QA
53
Analyzing the Behavior of Visual Question Answering Models
54
Hierarchical Question-Image Co-Attention for Visual Question Answering
55
Leveraging Visual Question Answering for Image-Caption Ranking
56
Joint Unsupervised Learning of Deep Representations and Image Clusters
57
A Focused Dynamic Attention Model for Visual Question Answering
58
Generating Visual Explanations
59
Dynamic Memory Networks for Visual and Textual Question Answering
60
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
61
“Why Should I Trust You?”: Explaining the Predictions of Any Classifier
62
Learning Deep Features for Discriminative Localization
63
Deep Residual Learning for Image Recognition
64
MovieQA: Understanding Stories in Movies through Question-Answering
65
Simple Baseline for Visual Question Answering
66
Visual Madlibs: Fill in the Blank Description Generation and Question Answering
67
Learning Common Sense through Visual Abstraction
68
Where to Look: Focus Regions for Visual Question Answering
69
Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge from External Sources
70
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
71
Yin and Yang: Balancing and Answering Binary Visual Questions
72
Visual7W: Grounded Question Answering in Images
73
Explicit Knowledge-based Reasoning for Visual Question Answering
75
Deep Compositional Question Answering with Neural Module Networks
76
Stacked Attention Networks for Image Question Answering
77
Mind's eye: A recurrent visual representation for image caption generation
78
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question
79
Exploring Nearest Neighbor Approaches for Image Captioning
80
Exploring Models and Data for Image Question Answering
81
Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images
82
VQA: Visual Question Answering
83
Semantic classification of spacecraft's status: integrating system intelligence and human knowledge
84
Deep visual-semantic alignments for generating image descriptions
85
CIDEr: Consensus-based image description evaluation
86
From captions to visual concepts and back
87
Long-term recurrent convolutional networks for visual recognition and description
88
Show and tell: A neural image caption generator
89
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
90
Explain Images with Multimodal Recurrent Neural Networks
91
A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
92
GloVe: Global Vectors for Word Representation
93
Interactively Guiding Semi-Supervised Clustering via Attribute-Based Explanations
94
Zero-Shot Learning via Visual Abstraction
95
Very Deep Convolutional Networks for Large-Scale Image Recognition
96
Predicting User Annoyance Using Visual Attributes
97
Predicting Failures of Vision Systems
98
Microsoft COCO: Common Objects in Context
99
How Do You Tell a Blackbird from a Crow?
100
Learning the Visual Interpretation of Sentences
101
Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
102
Bringing Semantics into Focus Using Visual Abstraction
103
Multi-attribute Queries: To Merge or Not to Merge?
104
Relative Attributes for Enhanced Human-Machine Communication
105
What makes Paris look like Paris?
106
The role of image understanding in contour detection
107
Automatic discovery of groups of objects for scene understanding
108
Discovering localized attributes for fine-grained recognition
109
Understanding the Intrinsic Memorability of Images
110
Recognizing jumbled images: The role of local and global information in image classification
111
Unbiased look at dataset bias
112
Finding the weakest link in person detectors
113
iCoseg: Interactive co-segmentation with intelligent scribble guidance
114
The role of features, algorithms and data in visual recognition
115
Seed Image Selection in interactive cosegmentation
116
ImageNet: A large-scale hierarchical image database
117
Unsupervised learning of hierarchical spatial structures in images
118
Semi-supervised co-training and active learning based approach for multi-view intrusion detection
119
From appearance to context-based recognition: Dense labeling in small images
120
Bringing diverse classifiers to common grounds: dtransform
121
Hierarchical Semantics of Objects (hSOs)
122
Combining classifiers for multisensor data fusion
123
Ensemble of classifiers approach for NDT data fusion
124
12-in-1: Multi-Task Vision and Language Representation Learning
125
Past Graduate Interns • Sarmista Velury
126
Cross-channel Communication Networks
127
Our work on teaching bots to navigate New York City using natural language was covered in MIT Technology Review, Forbes, Fast Company
128
Our work on Embodied Question Answering (Embodied QA), a first step towards agents that can see, talk, and reason, was covered in MIT Technology Review, and others
129
ADVISING ACTIVITY Current Graduate Advisees • Samyak Datta Ph.D. student, Since Fall
130
Featured news story about my Google Faculty Research Award and Dhruv Batra's Office of Naval Research (ONR) Young Investigator Program (YIP) award
131
Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization
132
Hadamard Product for Low-rank Bilinear Pooling
133
A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories
134
Deeper LSTM and normalized CNN Visual Question Answering model. https://github.com/VT-vision-lab/ VQA_LSTM_CNN
135
• Future Directions in Computer Vision Department of Defense workshop
137
• National Science Foundation (NSF) Information and Intelligent Systems (IIS) Division
138
Technical Opportunities on Campus" for first year engineering female students
139
Can cartoons be used to teach machines to understand the visual world
140
Inference for Order Reduction in MRFs
141
Indraprastha Institute of Information Technology
142
uWave: Accelerometer-based Personalized Gesture Recognition
143
Ensemble Based Data Fusion for Early Diagnosis of Alzheimer's Disease
144
Evaluate the Effect of Ground Tire Rubber on Laboratory Rutting Performance of Asphalt Concrete Mixtures
145
2019, I was featured in Vogue's "Dream Makers. How the women in AI are shaping our future
146
SOrT-ing in VQA: Contrastive Gradient Learning for Improved Consistency
147
Program Committee of Workshops
148
2011 (Oral) Marr Prize
149
Featured news stories about my National Science Foundation (NSF) CAREER Award • Virginia Tech's Bradley Department of Electrical and Computer Engineering
150
Featured news story about my Amazon Academic Research Award • Georgia Tech's College of Computing
151
Incredible Women Advancing A.I. Research" • Forbes
152
2005 7th International Conference on Information Fusion (FUSION) A Multiple Classifier Approach for Multisensor Data Fusion
153
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations
154
of the Association for Computational Linguistics