Abstract
The University of Central Florida invention is a capsule network approach to enhance Visual Question Answering (VQA) processes. The invention is a method that applies reasoning within a visual scene to determine a more accurate object, action or relational recognition. Most approaches rely on input feature maps from object detection models that are pretrained with the relevant object classes. This makes it necessary to restrict the scope to known object classes or to annotate the regions of relevant objects. The approaches also require the pretraining of an object detector, thus, limiting the extension of such methods to datasets with object-level annotation. This work focuses on weakly-supervised visual grounding based on VQA supervision.
Stage of Development
Prototype available.
Benefit
SimplicitySignificantly better at answer localizationIncreases a system’s explain-abilityMarket Application
Explainable AIAccessibility applications for visually impaired peopleEvidence-based decision-making systemsDialog-based systemsPublications
Found a Reason for me?
Weakly-supervised Grounded Visual Question Answering using Capsules, arXivLabs, arXiv:2105.04836v1. IEEE
Conference on Computer Vision and Pattern Recognition, 2021.
Brochure