A novel feature fusion strategy that proceeds in a hierarchical fashion, first fusing the modalities two in two and only then fusing all three modalities, which outperforms conventional concatenation of features by 1%, which amounts to 5% reduction in error rate.