A shallow optical flow three-stream CNN (SOFTNet) model is proposed to predict a score that captures the likelihood of a frame being in an expression interval by fashioning the spotting task as a regression problem and introducing pseudo-labeling to facilitate the learning process.
Facial expressions vary from the visible to the subtle. In recent years, the analysis of micro-expressions— a natural occurrence resulting from the suppression of one’s true emotions, has drawn the attention of researchers with a broad range of potential applications. However, spotting micro-expressions in long videos becomes increasingly challenging when intertwined with normal or macro-expressions. In this paper, we propose a shallow optical flow three-stream CNN (SOFTNet) model to predict a score that captures the likelihood of a frame being in an expression interval. By fashioning the spotting task as a regression problem, we introduce pseudo-labeling to facilitate the learning process. We demonstrate the efficacy and efficiency of the proposed approach on the recent MEGC 2020 benchmark, where state-of-the-art performance is achieved on CAS(ME)2 with equally promising results on SAMM Long Videos.