Sorry If I mis-conveyed the ideas. They are quite different.
The openai paper is introducing operations which are a fast middle ground between dense and sparse operations. You still have to specify the sparsity structure you like. (Although often some random sparsity structure work well).
The MIT paper describe one way to choose a sparsity structure and starting point which will work well in the general case.
The OpenAI approach is more amenable to an obvious HW implementation with the block sparsity because the blocks are are GEMM operations are implemented in the first place.
There are obviously more available sparse solutions if the block sparsity constraint is relaxed therefore I wouldn't be surprised if the best results come from such a network.
The openai paper is introducing operations which are a fast middle ground between dense and sparse operations. You still have to specify the sparsity structure you like. (Although often some random sparsity structure work well).
The MIT paper describe one way to choose a sparsity structure and starting point which will work well in the general case.