An effective spatiotemporal feature alignment network tailored to VSP is developed, mainly including two key sub-networks: a multi-scale deformable convolutional alignment network (MDAN) and a bidirectional convolutionsal Long Short-Term Memory (Bi-ConvLSTM) network.