While our greyscale pre-training was originally motivated by domain alignment for medical imagery, its broader benefits may have been underestimated. In this continuation of our investigation into greyscale backbones, we explore the unexpected advantages of removing colour, specifically in terms of efficiency and robustness, prior to any downstream transfer. We compare ResNet-50 models trained on ImageNet-1K using RGB, 3-channel greyscale, and 1-channel greyscale inputs, isolating colour representation as the only difference between them.
We first examine practical performance: the 1-channel model completes epochs faster and consumes significantly less memory, enabling more efficient training. Beyond efficiency, we assess robustness using the model-vs-human benchmark