Learning Through Auxiliary Supervision For Multi-Modal Low-Resource Natural Language Processing