Again, MNIST is not Drosophilia of deep learning.
For deep learning, it is a trivial. And often misleading, a noticed by François Chollet:
> Many good ideas will not work well on MNIST (e.g. batch norm). Inversely many bad ideas may work on MNIST and no[t] transfer to real [computer vision].
It might be considered Drosophilia of machine learning, though. Take a look at this beautiful table of the results of various classifiers http://yann.lecun.com/exdb/mnist/.
Many good ideas will not work well on MNIST (e.g. batch norm). Inversely many bad ideas may work on MNIST and no[t] transfer to real [computer vision].
Unless your dataset is more similar to mnist than to “real computer vision” (eg galaxies or piano rolls) and you still want to use deep learning to classify it.
All similarly sparse data samples would suffer from the bachnorm issue. I don’t remember if I tried a convnet with batchnorm on galaxy classification but I did try it on piano rolls - it was bad - precisely because of batchnorm, and had I first tried the same model on mnist I would have caught the issue much faster (I tested it on cifar).
I suspect a chess position evaluation would suffer from batchnorm just as much, if the intermediate feature maps remain sparse.
> Many good ideas will not work well on MNIST (e.g. batch norm). Inversely many bad ideas may work on MNIST and no[t] transfer to real [computer vision].
It might be considered Drosophilia of machine learning, though. Take a look at this beautiful table of the results of various classifiers http://yann.lecun.com/exdb/mnist/.