Geometry and correlations

I recently got around to writing up some ideas I had been mulling over for quite some time. Namely, how to optimally fit implicitly defined non-linear mathematical relations to data, when the data has independent errors, and the relation can have some intrinsic scatter around it as well. And what then, if the data has some pre-existing geometry, such as measurements of angular position on the sky, which have a spherical geometry? Before figuring out how to optimally fit something, we then have to first figure out how to generalize the notion of intrinsic scatter.

The solution I propose is pretty straightforward. Define intrinsic scatter by using a geodesic distance from the submanifold defined by the relation we wish to fit. That is, a relation f:\mathbb{R}^n\rightarrow\mathbb{R}^{n-k} defines some k-dimensional submanifold S populated by the data points. However, the data may have some scatter in a direction orthogonal to $latex S$. One way of representing this orthogonal direction is to say that the data has some scatter along a geodesic, orthogonal to S. The geodesic formulation gives a natural representation of orthogonal distance and direction even in the case of non-Euclidean geometry. One can then define this intrinsic scatter distribution on the normal space for each point of S. The normal space is the subset of the tangent space that contains only the tangent vectors orthogonal to S at a given point. Then, if we use Riemannian normal coordinates to represent the full manifold in the neighbourhood of a point of S, we can propagate this intrinsic distribution to the actual manifold where the data points reside. This may sound very technical, but when drawn as a figure, is geometrically pretty obvious. Check out the Figures 1 to 3 in the paper, which should hopefully make the idea pretty clear.