A Modular Vision Language Navigation and Manipulation Framework for Long Horizon Compositional Tasks in Indoor Environment (2021-01-19T00:00:00.000000Z)