Visual Question Answering with Module Networks

This work employs the Question Answering capabilities of Module Networks (Hu et al.). Language guides the generation of neural architectures that maximizes the likelihood of answering a question correctly when correlated with visual embeddings.

A seq2seq architecture translates open questions into a sequence of available modules (Age, Gender, Emotion, Find, Transform, Locate, And, Describe, etc.) whose In-order traversal represents a hierarchical relation between modules. 25,050 unique questions generate hierarchical module networks. Some modules receive visual and language features while others receive attention maps.
The res5c layer from ResNet-152 pretrained on ImageNET produces embeddings vectors of (1, 14, 14, 2048).

Citation

[1] R. Hu, J. Andreas, M. Rohrbach, T. Darrell, K. Saenko, Learning to Reason: End-to-End Module Networks for Visual Question Answering. in arXiv preprint arXiv:1704.05526, 2017.

@article{hu2017learning,
  title={Learning to Reason: End-to-End Module Networks for Visual Question Answering},
  author={Hu, Ronghang and Andreas, Jacob and Rohrbach, Marcus and Darrell, Trevor and Saenko, Kate},
  journal={arXiv preprint arXiv:1704.05526},
  year={2017}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
models_vqa		models_vqa
output		output
util		util
.gitattributes		.gitattributes
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Visual Question Answering with Module Networks

Citation

About

Uh oh!

Releases

Packages

Languages

omar-florez/VQA-ModuleNetworks

Folders and files

Latest commit

History

Repository files navigation

Visual Question Answering with Module Networks

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages