Hi all,
Thanks for the amazing work, and for publishing it :)
I am interested in the spatial component of the NaVQA dataset. I am wondering what the spatial position refers to, specifically when the questions target objects? I.e. is the correct label the position from which the object was observed, or the position of the object itself?
Thanks!