No no no you are thinking this one a little wrong. Yes Pythagorean identity, or ...

graycat · on June 3, 2017

Yes, now we can also do best L^1 approximations. IIRC -- I'm in a hurry this morning -- it's a linear programming problem or some such but I haven't thought about that in decades.

You bring up a lot of stuff I've never reviewed.

> but that does not mean it is a suitable performance metric to use.

If the plane do the L^2 projection on is really close, then the error is really small, and what's not to like? Really close is good enough for me.

Right, there are three biggie choices, L^1 (minimize the sum of absolute values of the errors), L^2 (minimize the sum of squared errors), and L^infinity (minimize the worst error).

L^2's fine with me, good enough for government work, okay first cut, day in and day out.

If in some particular case L^2 has some problems, then maybe something like regression is not the right tool for the problem.

Gee, we do a lot of L^2: We get orthogonal components so that we can get L^2 cafeteria style, just pick the components we want. E.g., in filtering stochastic processes, we take a Fourier transform -- that is finding the coefficients of the sample path on the sine-cosine orthogonal components. Or we do a convolution, which is the same thing in the end. The approximation we get is an L^2 approximation.

So, with JPG -- sure, it's L^2. Right, JPG does funny stuff nearly lines. Life's not perfect!

For some things, sure we want L^infinity: E.g., if we want to use a quotient of polynomials to approximate the usual special functions, then we want to minimize the worst error and do what is called Chebyshev approximation, but this is very specialized.

srean · on June 3, 2017

> it's a linear programming problem

Indeed it is. Lets catchup more sometime. Here's my email again srean.list on gmail

Totally agreed there is lot to like about L2 but there are plenty situations where it is terrible (in fact some of them are on one topic of your interest: monitoring server farms). In some of those situations L1 or a combination of L1 and L2 helps a lot.

graycat · on June 3, 2017

> You say variance covariance etc etc, but it does not take much for RVs not to possess them

In practice, essentially always, for real random variables X, Y, all of the following exist and are finite:

E[X], E[|X|], E[X^2], E[XY]

Var(X) = E[(X - E[X])^2]

Std(X) = Var(X)^(1/2)

Cov(X,Y) = E[(X - E[X]) (Y - E[Y])]

Cor(X,Y) = Cov(X,Y)/(Std(X) Std(Y))

E[Y|X]

E[X] and Std(X)are useful quite broadly and that we find/estimate them does not mean that we are working with a Gaussian distribution.

If the above is not true, then there is something bizarre and/or pathological, the ultimate edge case, and we need to review what we are doing.

srean · on June 4, 2017

Unfortunately its not true in many cases I have seen, essentially because the tail of the error does not fall fast enough. Technically, for all bounded RVs all moments are bounded, but in some of these situations the variance is so high that its infinite for practical purposes.

All you need is the tail to fall slower than a quadratic.