Abstract: The integration of vision and language has propelled the advancement of artificial intelligence systems. Visual Question Answering (VQA) stands at the nexus of computer vision and natural ...