Speech gesture generation from the trimodal context of text, audio, and speaker identity (2020-09-04T00:00:00.000000Z)