Sorry If I mis-conveyed the ideas. They are quite different. The openai paper is...

varelse · on May 7, 2019

The OpenAI approach is more amenable to an obvious HW implementation with the block sparsity because the blocks are are GEMM operations are implemented in the first place.

There are obviously more available sparse solutions if the block sparsity constraint is relaxed therefore I wouldn't be surprised if the best results come from such a network.

thesz · on May 7, 2019

The openai paper presents a way to learn sparsity - the block-sparse structure with blocks bigger than 1x1 has been chosen for efficiency reasons.

You may as well learn block-sparse architecture with 1x1 blocks, effectively doing what MIT was doing, but without two phases.