You can't apply this stuff mindlessly. Consider this suggestion from the article: "Differences between people: (Height, Weight, Age)". If you use the formula, you end up with inch^2 + pound^2 + year^2. Mixing units is bad enough. Now imagine how the results will change if you switch to the metric system.
I think the missing element is this: you need to establish that the domain that you are trying to measure is a metric space before trying to measure distance using this formula.
Exactly! It is a big mistake to assume that you can express relations between the entities you study with a single number. I've never heard however of non-metric topological spaces used in applications.
The use of the color distance measure turns out to be important in spam filtering because of the 'Camouflage' trick used by some spammers where similar, but not identical colors are used to mask chunks of 'good' text inserted to try to fool spam filters.
Camouflage (GWI!Camouflage!HTML)
What: Like Invisible Ink, but instead of using identical colors (e.g. white on white) use very similar colors.
Date added: June 2, 2003
Example from the wild:
The colors 1133333, 123939, and 423939 are chosen to be very similar without being the same)
But finding the distance between two N-D points isn't really hard at all.
What is hard is finding the distance between a point and a set of points. Or a set and a set.
Doing exhaustive search is wasteful.
If you're interested, look in to K-D trees as a real solution. Best-bin-first modified K-D trees are the basis of the SIFT object instance recognition feature matching algorithm. Break an image into a set of ND features. Matching a geometrically consistent subset of those features to a previously seen object works extremely well.
The algorithm is general. Change and add features to make it more powerful or faster. But the idea of using a set of ND linear features to describe an object should last for years.
Not to snipe at anyone, but skills like this seem pretty essential in solving problems in general. Little tricks from linear algebra like magnitude (what this post concerns) and projections show up in interesting and useful places.
Additionally, while they /are/ harder to access, reading and understanding various proofs in Math can be an even more beautiful and enlightening experience than just seeing the practical result.
Maybe it's the difference between realizing that setf can be using on all the generalized variables and actually reading the source and seeing why. Math is full of clever hacks.
Pythagoras theorem only holds in Euclidean geometry (or so, Wikipedia says). The computer world has no notion of space, so there's no a priori reason for choosing Pythagoras over other norms%. Has anyone here experimented with alternatives?
I've used the 1-norm, aka "Manhattan", distance (with weights of course) for visual image search. I think the 1-norm is always a better choice than the Euclidean distance unless there is some clear geometric context.
The 1-norm tends to also be less sensitive to outliers, and in machine learning, 1-norm regularization leads to sparse solutions. The real reason 2-norm is popular is that it is easy to minimize (differentiable).
Very simple, but there are some good examples here about how you might do this to quantify similarity in users based on their expressed preferences. Simple techniques are often best.