Transformer-based methods have shown great potential in single image super-resolution (SISR), particularly for capturing long-range dependencies, and are now being applied to Magnetic resonance imaging (MRI) super-resolution. This paper introduces the Multi-Scale Contextual Transformer (MSCT), designed specifically for MRI super-resolution. The network includes a shallow feature extraction module, cross-feature enhanced residual groups (CFERGs), and a reconstruction module. Shallow features are extracted using a $3 \times 3$ convolution layer, followed by the CFERG module, which integrates global and local dependencies through feature refinement fusion blocks (FRFBs), combining global attention capture block (GACB) and channel fusion enhancement module (CFEM). This design effectively fuses multi-scale information, captures global information with a rectangular attention mechanism, and reduces computational complexity. Experimental results show MSCT improves reconstruction accuracy, reduces noise, and preserves MR image details.