One thing to keep in mind is that predicting a downturn in the short-term is different than predicting long-run performance over 10, 20 or 30 years. I think the website is trying to do the former and your article is talking about the latter.
"If you really want to own a stock that gives you no profits, no income from dividends, no voice in how its run, and actually no value what so ever other than the greater fool theory then go ahead."
While I see you sentiment here, I don't think it's actually correct. Non-voting common stock will receive distributions if any cash is left over after a liquidation and debt, preferred and high-ranked common stock holders are paid (i.e. there is a real claim on assets). Also, with any common stock that doesn't pay dividends, the reason to hold is the promise of dividends (and/or buybacks) when the company does not have any more avenues for investing excess income. This is true for non-voting shares as well.
I've written up my philosophy on beating the market, which is a little less conservative with respect to believing that markets are efficient and investing in the indices is the only prudent way to invest:
If these claims are true (specifically, that every local minimum is a global minimum), then why did the earlier neural networks have poor performance? Why did we need advancements like pretraining via stacked RBMs and dropout in order to make deep learning converge on usable/better models?
1. Our labeled datasets were thousands of times too small.
2. Our computers were millions of times too slow.
3. We initialized the weights in a stupid way.
4. We used the wrong type of non-linearity.
The pre-training helped with initialization, but later it turned out that just initializing the weights with correct scales for each layer (to deal with dissapearing/exploding gradient effect) worked almost as well with enough data.
Poor performance is generally evaluated by considering the validation error. This paper only cares about the training error: it is about the presence or absence of bad local minima in the optimization problem that one has to solve when training a MLP with ReLU activations. The learning and the optimization problems are related but they are not the same :)
Dropout is a tool to prevent overfitting (large gap between training error and validation error). This paper does not say anything with overfitting or generalization or the impact of regularization.
Also note that unsupervised pre-training via stacked RBMs has proven mostly useless for MLPs with ReLU activations if the network is wide enough and the number of samples in the training set big enough. It is unclear that initialization via unsupervised pre-training can improve the training error or not. I think it mostly has an impact on the validation error (although I am not sure).
Furthermore, practitioners tend to stop training before the full convergence on the training set. Instead one generally stops when validation error stops decreasing significantly (early stopping) and one does not really care about the final value of the training loss one could have reached if we had continued training forever. Traditional SGD has a convergence rate that is too slow and in practice it prevents checking whether we are converging to a bad local minima or not on non-toy problems.
To sum up: better understanding of the optimization problem is very helpful (in particular to tackle underfitting and reduce training times) but that alone will not ensure that we can build model that generalize correctly to unseen data.
Using ReLU units is a newer advancement and I agree that changing the activation function does change the cost function. However, before Hinton got all excited about ReLU units, he was still showing huge improvements just by using pretraining and later by using dropout, which shouldn't change the cost function.
Dropout helps with convergence/optimization, sure. The existence of a global minimum says nothing about the time required to reach it. Important to note that dropout isn't as common anymore; it's not a huge win.
The market movement can be explained somewhat by the fed's rate hike expectations changing (worsening economic conditions means that the fed is more cautious about raising rates, so discount rates are lower and valuation models are higher). It's also about what the market expected earnings to be. Yes, the tech earnings are beating up the NASDAQ, but people actually expected worse from the financial sector given the very low rates.
Unfortunately this talk is kind of dated already. Most people don't stack RBMs or autoencoders to pretrain the weights anymore. If you use dropout with rectified linear units, you don't have to pretrain, even for large architectures.