Again, MNIST is not Drosophilia of deep learning. For deep learning, it is a tri...

p1esk · on Dec 23, 2022

Many good ideas will not work well on MNIST (e.g. batch norm). Inversely many bad ideas may work on MNIST and no[t] transfer to real [computer vision].

Unless your dataset is more similar to mnist than to “real computer vision” (eg galaxies or piano rolls) and you still want to use deep learning to classify it.

stared · on Dec 23, 2022

Then, could you give an example of a concrete MNIST-like dataset you worked with?

Galaxies are not like such (more diverse, more noise, you don't start with centered images, usually more data - in size and channels).

p1esk · on Dec 23, 2022

All similarly sparse data samples would suffer from the bachnorm issue. I don’t remember if I tried a convnet with batchnorm on galaxy classification but I did try it on piano rolls - it was bad - precisely because of batchnorm, and had I first tried the same model on mnist I would have caught the issue much faster (I tested it on cifar).

I suspect a chess position evaluation would suffer from batchnorm just as much, if the intermediate feature maps remain sparse.