Learning Compact Discriminant Representation via Low-Rank Bilinear Pooling (2025-08-21T00:00:00.000000Z)

Abstract

In this paper, we explain the mechanism of bilinear pooling as a module of hard sample generation, and find that bilinear pooling significantly expands variances of the first-order vectors when it produces discriminative bilinear features. In conjunction with the extremely high dimensionality of the obtained bilinear features, those variances lead to overfitting in subsequent learning models. To solve this issue, we construct a bi-level optimization problem, where the high-level problem is the supervised classification loss, and the low-level problem is the principal component analysis (PCA). Then, we find that PCA on bilinear features is equivalent to spectral clustering, which allows us to mathematically prove that the first <inline-formula><tex-math notation="LaTeX">$\log _{2}(C)$</tex-math><alternatives><mml:math><mml:mrow><mml:msub><mml:mo form="prefix">log</mml:mo><mml:mn>2</mml:mn></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>C</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math><inline-graphic xlink:href="han-ieq1-3601355.gif"/></alternatives></inline-formula> principal components can support the discriminant information of <inline-formula><tex-math notation="LaTeX">$C$</tex-math><alternatives><mml:math><mml:mi>C</mml:mi></mml:math><inline-graphic xlink:href="han-ieq2-3601355.gif"/></alternatives></inline-formula> classes. By removing the rest principal components, the dimensionality and variances are simultaneously reduced. To the best of our knowledge, this is the first work providing a lower bound for dimension reduction for bilinear pooling. However, the PCA projection matrix <inline-formula><tex-math notation="LaTeX">$\mathbf{L}$</tex-math><alternatives><mml:math><mml:mi mathvariant="bold">L</mml:mi></mml:math><inline-graphic xlink:href="han-ieq3-3601355.gif"/></alternatives></inline-formula> is prone to overfitting due to having many parameters. To address this issue, we propose a rank-<inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="han-ieq4-3601355.gif"/></alternatives></inline-formula> general bilinear projection (RK-GBP) that decomposes <inline-formula><tex-math notation="LaTeX">$\mathbf{L}$</tex-math><alternatives><mml:math><mml:mi mathvariant="bold">L</mml:mi></mml:math><inline-graphic xlink:href="han-ieq5-3601355.gif"/></alternatives></inline-formula> into two small matrices <inline-formula><tex-math notation="LaTeX">$\mathbf{U}$</tex-math><alternatives><mml:math><mml:mi mathvariant="bold">U</mml:mi></mml:math><inline-graphic xlink:href="han-ieq6-3601355.gif"/></alternatives></inline-formula> and <inline-formula><tex-math notation="LaTeX">$\mathbf{V}$</tex-math><alternatives><mml:math><mml:mi mathvariant="bold">V</mml:mi></mml:math><inline-graphic xlink:href="han-ieq7-3601355.gif"/></alternatives></inline-formula>, whose learnable parameters are smaller. Different from traditional bilinear projections used in factorized bilinear pooling (FBiP), our RK-GBP can preserve the orthogonality of columns in <inline-formula><tex-math notation="LaTeX">$\mathbf{L}$</tex-math><alternatives><mml:math><mml:mi mathvariant="bold">L</mml:mi></mml:math><inline-graphic xlink:href="han-ieq8-3601355.gif"/></alternatives></inline-formula> by constraining the orthogonality of columns in <inline-formula><tex-math notation="LaTeX">$\mathbf{U}$</tex-math><alternatives><mml:math><mml:mi mathvariant="bold">U</mml:mi></mml:math><inline-graphic xlink:href="han-ieq9-3601355.gif"/></alternatives></inline-formula> and <inline-formula><tex-math notation="LaTeX">$\mathbf{V}$</tex-math><alternatives><mml:math><mml:mi mathvariant="bold">V</mml:mi></mml:math><inline-graphic xlink:href="han-ieq10-3601355.gif"/></alternatives></inline-formula>. For computational efficiency, we relax the PCA in the low-level task into a dictionary learning problem, obtaining the rank-<inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="han-ieq11-3601355.gif"/></alternatives></inline-formula> orthogonal factorization bilinear pooling (RK-OFBP). The RK-OFBP can be considered as a general form of current factorization bilinear pooling methods (e.g., Hadamard product-based ones). Finally, we evaluate our approach on fine-grained images and large-scale datasets, demonstrating that our proposed method not only produces extremely low-dimensional features but also outperforms other methods in classification tasks. For example, our RK-OFBP can employ 32-dimensional vectors to achieve comparable results to B-CNN (Lin, 2015) (dimension: 512*512) for the 200-class classification task.