It is shown that CNN in classification mode can be trained from scratch using the consecutive bounding boxes of each identity, and the learned CNN embedding outperforms other competing methods considerably and has good generalization ability on other video re-id datasets upon fine-tuning.